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Preface 



In seminars and graduate level courses I have had several opportunities to 
discuss modeling and analysis of time series with economists and economic graduate 
students during the past several years. These experiences made me aware of a gap 
between what economic graduate students are taught about vector-valued time series 
and what is available in recent system literature. 

Wishing to fill or narrow the gap that I suspect is more widely spread than 
my personal experiences indicate, I have written these notes to augment and reor- 
ganize materials I have given in these courses and seminars. I have endeavored 
to present, in as much a self-contained way as practicable, a body of results and 
techniques in system theory that I judge to be relevant and useful to economists 
interested in using time series in their research. I have essentially acted as 
an intermediary and interpreter of system theoretic results and perspectives in 
time series by filtering out non-essential details, and presenting coherent accounts 
of what I deem to be important but not readily available, or accessible to economists. 
For this reason I have excluded from the notes many results on various estimation 
methods or their statistical properties because they are amply discussed in many 
standard texts on time series or on statistics. 

The notes naturally divide into three parts : Chapters 1 through 6 are pre- 

paratory to the main part of the notes. The notion of state, which is basic in 
representing time series by Markovian models, is introduced early in Chapter 2. 
Chapter 3 describes time- invariant dynamic systems, i.e., systems whose properties 
remain invariant with respect to shift of the origin of the time axis, which we 
mostly use to represent time series after suitable processings of data if necessary. 
Here, locations of zeros of the numerator and denominator polynomials of transfer 
functions are related to the notions of inverse systems, stable systems and minimum 
phase systems, the last appearing prominently in our later chapters. Several ways 
to represent time series are taken up in Chapters 4 and 5. Chapter 6 considers 
preliminary processings of time series data to fit economic data series into a 
common framework of mean zero, finite covariance weakly stationary stochastic 
processes. Chapters 7 through 10 constitute the main part of these notes. There 
I use singular value decomposition of certain matrices made up of covariances of 
data vectors to produce Markovian models that generate time-indexed data vectors . 
These models, after further refinement by maximum likelihood procedures if necessary, 
can be used to predict future values of the data vectors. Connection of this method 
with the canonical correlation method of Akaike is also explained. Chapter 11 on 
time series from intertemporal optimization may be of particular interest to some 
macroeconomists in view of recent research interests in explaining business cycles 
using equilibrium macroeconomic models. Identification of closed-loop systems 
and time series generated by dynamic models incorporating rational expectation are 
the final two topics of the lecture notes . Chapter 14 is the third part of the 




VI 



notes and contain several numerical examples mostly drawn from Japanese economic 
time series. 

To help bridge the gap or barrier faced by someone who is not versed in the 
system theoretic language I have collected a number of brief but mostly self-contained 
accounts of the facts I use in the main body as mathematical appendices. 

In preparing these notes, the author received help from many friends and col- 
leagues. Sean Becketti of University of California, Los Angeles and Hiroshi Yoshikawa 
of the Institute of Social and Economic Research, Osaka University commented on an 
earlier draft. Leonard Silverman of Department of Electrical Engineering, University 
of Southern California showed me his unpublished report. Jorma Rissanen of IBM, 

San Jose told me of several important recent works on the time series analysis. I 
owe Quirino Paris of University of California, Davis a reference. Dr. Hirotsugu 
Akaike of the Institute of Statistical Mathematics, Tokyo made available to the 
author the computer programs that implement his AIC criterion. Arthur Havenner of 
University of California, Davis was instrumental in organizing a series of seminars 
at which some of the material in preliminary form was tried out. He also provided 
most useful comments on an earlier draft. Axel Leijohhufvud of University of 
California, Los Angeles helped the author by arranging for his visits to the 
Department of Economics at University of California, Los Angeles where a preliminary 
version of the notes was tried out at a graduate level economics course. 

These notes were typed expertly and expeditiously by Ms. Y. Ishida, T. Kawata, 

K. Uto and G. Nystrom. Computations were carried out by Messrs. H. Ebara, S. Tateishi, 
K. Nakagawa and Ms. C. Baden. 



Osaka 
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1 INTRODUCTION 



Time series arise when data are collected over time, either continously or 
at discrete time instants, and usually on several related variables. Together 
they produce vector-valued, and time- indexed data which record economic activities. 
Economic data are often collected at regular intervals such as daily, weekly, 
monthly etc. 

We analyze data jointly rather than singly, i.e., as vectors rather than as 
scalars; 

(1) to uncover dynamic or structural relations among them, because some series 
may lead or lag other series, and there may be feedbacks between them, 

and ultimately 

(2) to forecast better, because modeling of a collection of time series as vector 
valued use related information in data jointly. 

Study of time series has history much older than modern system theory. 
Probabilists, statisticians and econometricians all have contributed to advance 
our understanding of time series over the past several decades. Many well estab- 
lished books record their contributions. One may wonder what new results system 
theory can add to this well-established field and doubt if any new perspective 
or insight can be gained by this relative newcomer to the field. History of 
science shows us, however, that same problems can and have been examined with 
advantage by different disciplines, partly because implications of alternative 
assumptions are explored by researchers with different backgrounds or interests, 
and partly because new techniques developed elsewhere are brought in to explore 
areas left untouched by the discipline in which the problem originated. Although 
a latecomer to the field of time series analysis, system theory has brought a 
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set of viewpoints, concepts and tools slightly different from the traditional 
ones, and they are effective in dealing with vector-valued time-indexed data. 

We believe that system theory has provided new perspectives, and contributed 
new results especially on vector-valued time series that are of potential interest 
to economists who must deal with time series but are not experts in time series 
analysis and modeling. Our primary objective in writing this set of notes is to 
help them overcome the language barriers because these results and perspectives 
are stated in a language unfamiliar to them, and make these results and new tools 
accessible to economists in order that they may benefit from system theory in 
their own researches. 

What are the new perspectives and results we speak of? First, how do we 
represent time series? Loosely put, traditional time series analysis is primarily 
directed toward scalar-valued data, and usually represent time series by scalar 
autoregressive, moving average or autoregressive-moving average models. We 
provide alternative modes of representing vector-valued time-indexed data, and 
directly connect exogenous variables at several time points with endogenous 
variables also at several time instants. In one mode, this connection is expressed 
by means of the transfer function matrices which relate the input time series 
with the output time series. Classical control literature also made much use 
of the transfer functions of dynamic systems. Modern control and system theory 
improves on the classical results and handle several variables simultaneously 
as vector-valued variables and has introduced an alternative mode of representing 
dynamic phenomena, called the state space or Markovian representation, by defining 
internal or state space variables as useful auxiliary variables. Although these 
two ways of representing dynamic phenomena are equivalent, they have their own 
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advantages and disadvantages. Having two alternate ways of dealing with vector- 
valued time series is definitely worthwhile. 

Prompted by this different viewpoint of dynamic phenomena, or different 
mode of descriptions thereof, and by the necessity of paying greater attention 
to the interrelation of vector components, system theory has introduced theoret- 
ical notions that are nontrivial only for vector-valued time series such as that 
of reachability, observability, and of minimal realization which are not found 
in the traditional i.e., scalar-value oriented time series literature. These 
notions turn out to be rather significant in many cosiderations on modeling of 
time series by minimal dimensional Markovian representations and in examining 
robustness of various algorithms for identification. For example, as we later 
show, the problem of common factors in the AR portion and MA portion of ARMA 
models is exactly that of minimal realization of given time series by Markovian 
models . 

Secondly, given a time series represented in state space form how do we 
construct a particular model, i.e., pick dimensions and estimate parameters? How 
do we test for identif iability? We use canonical correlations or equivalently 
singular value decomposition of matrices composed of covariances known as Hankel 
matrices and relate Hankel matrices to construction of minimal dimensional 
Markovian representations of vector-valued time series, and state identif iability 
of closed- loop dynamics in terms of a system notion called return differences of 
feedback systems. 

In this set of notes I try to highlight a few aspects of analysis and model- 
ing of time series that are primarily system theoretic in origin or orientation 
in order to. provide some new perspectives or analytical techniques. I have chosen 
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Hankel matrices as a unifying theme to treat time series prediction, representation 
of data by (lower order) state space models, and examination of identification and 
identif lability conditions from a common viewpoint. Singular value decomposition 
of certain Hankel matrices followed by suitable scaling produces so-called inter- 
nally-balanced state space models of vector-valued time series. These models may 
further be refined if needed by additional parameter estimation steps by maximiz- 
ing likelihood functions suitably modified to account for the number of parameters 
in the models. Two criteria due to Akaike and Rissanen are discussed. 




2 THE NOTION OF STATE 



Time series are regarded as being generated by as Markovian or state 
space dynamic systems. The concept of state is one of the key notions in 
dynamics. State is not a topic routinely discussed by economists or econ- 
ometricians. (Harvey [1981] seems to be the only book on time series written 
by econometricians that mentions state space. Even he devotes only one 
chapter to this topic, however.) State naturally arises in problems of optimi- 
zation over time. It is not an artificial mathematical construct to burden 
economists unnecessarily. Bellman and Dreyfus [1962] discuss many deterministic 
and stochastic examples illustrating this fact. See Chapter 11 as well. 

Loosely put, a state vector of a deterministic dynamic system is a 
minimum collection of information necessary to uniquely "determine" the 
future evolution of the dynamic system, given future time paths of all 
relevant exogenous variables affecting -the system including decision or 
choice variables. For example, in a system governed by z t+ ^ = f( z t r x t ^ 
where x is an exogenous variable, the vector is a state vector of this 
system, because z is uniquely determined by and x fc . 

Suppose dynamic equations involve some predetermined, i.e., lagged 
endogenous variables. An example is a system described by 
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The value of and x t _^ must additionally be known before z ^ can be 

uniquely determined. Evidently z alone is not a state vector of this 
system. Introduce additional information by defining a vector by 



where 



n t-i 
Then the relation 



t+1 



and 



w t = x t-r 



' z t+l 




y t + l 


= 


w 




^ t+l J 





f t (z t' y t' w t' x t> 



= F t (s t , x t >, 



where F(', •) is defined by the above identity, shows that the knowledge of 

Sj. plus the values of current and future exogenous variables x^, T _> t 

suffice to determine s^ hence z . T > t uniquely. Thus the vector s 
t T — 41 t 

qualifies as a state vector for this dynamic system. Chapter 5 describes 
systematic methods for converting linear dynamic models such as ARMA models 
into state space form where state vectors may contain some lagged endogenous 
variables. The above example shows that a similar procedure works for non- 
linear systems as well. 

When stochastic processes are involved, we must properly re- interpret 
the phrase uniquely "determine" in our description of the notion of state. 

In stochastic systems, probability laws for evolution are the best one can 
specify to determine uniquely future evolutions of the dynamic system in 
general. In special cases where probability laws can be specified by a 
few statistics such as first or second order moments, then they can serve 
as finite-dimensional state vectors. Otherwise the state vector becomes 



infinite-dimensional . 




3 TIME -INVARIANT LINEAR DYNAMICS 



Dynamic systems or their models determine the time paths of endogenous 
variables for given time paths of exogenous variables. If the models are re- 
presented by a (set of) differential or difference equations, then endogenous 
variables are obtained as the solutions of these equations using exogenous 
variables as the right-hand term, i.e., as the input or forcing variables, as 
they are called in the system literature. 

Dynamics are called time- invariant or time-homogeneous if the system 
characteristics do not change with time. In other words, a stationary output 
y(t+T) results in response to the input u(t+T) where u(t) causes the stationary 
output to be y(t) , i.e., time translation of input signals merely translates 
the output time function by the same amount. Time- invariance corresponds to 
the notion of stationarity in stochastic processes. In stochastic processes 
probability laws or moments are invariant with respect to translation of the 
processes along the time axis. 

Dynamic systems are called linear if endogenous variables (outputs) 

0^ (t) + corres P on( 3 to exogenous variables (inputs) a^u^(t) + 

a 2 u 2 (t) where y^(t) is endogenous variable corresponding to u^(t) alone, i = 

1, 2. We say that the superposition principle holds for linear systems. 
Stability is implicitly assumed in discussing stationary outputs . Effects 
of nonzero initial state (initial conditions) die out with time for stable 
systems. Actually, this is more or less what we mean by stable dynamics. 

See Aokt 11976, Chapter 4] for more precise discussion of stability. 
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3.1 Continuous Time Systems 

Because periodic signals have Fourier series representation, and because the 
superposition principle holds for linear dynamic systems, we know responses of a 
linear, time-invariant dynamic systems to any periodic input signal once we know 
the output to a stationary input u(t) = e^ Wt . Denote it by y(t). Then, by 
linearity the input u(t+T) = 0 jw(t+T) _ e ja)T #e jcot _ produces the 

stationary output e^ WT y(t). On the other hand, by the time invariance, y(t+T) 
is the stationary output produced by u(t+T) . Hence assuming uniqueness of 
stationary outputs y(t+T) = e^ WT y(t). Setting t to zero and replacing T by t 
yield a relation 

jo)t 

Y (t) = e Y (0) . 

This equation shows that the stationary output produced by the input e^ Wt is a 
constant multiple of the input. We replace the constant y(0) by H(jo)) to show 
the explicit dependence of the constant (i.e., independent of t) on ju). This 
expression H(ju)), which is a complex number in general, is called the frequency 
response function and shows how the system responds to signals of different 

frequencies. For a general input u(t), express it by its Fourier transform as 

r°° 

U(t) = 2 ^ U(joo)e ja)t dw, 

J —CO 

i.e., the signal is made up of periodic signal e~ ,a)t with amplitude U(ju))/2TT. 
Because e^ Wt produces output H ( ju)) e^^ , by the superposition principle the 

signal u(t) produces the output 

r°° 

y(t) = * 2 “ J H ( ja))u ( jw) e^ Wt da). 

Taking its Fourier transform, we can express the above as 
Y(j0)) = H(j0))U(j03) . 
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The same relation holds when jw is replaced by a general complex number s = a+ja) 

(1) Y (s ) = H (s) U (s) . 

This H(s) is called the transfer function. 

In the time domain a convolution of a function h ( • ) with the input u ( • ) 
represents the above relation 

J oo .00 

h(T)u(t-T)dT = h (t-T)u(T)dT, 

—co J —co 

where h(t) is obtained by the inverse Laplace transforms and is called the 
impulse response function. 

We say a dynamic system is causal when its impulse response function van- 
ished for the negative time argument, h(t) = 0 for t < 0. Referring to the 
convolution expression above, the causal system output y(t) is determined by 
inputs u(t) , x <_ t only, i.e., any future signal u(t+s) , s 0 does not affect 
the value of y(t) hence the system is called causal . 

For continuous dynamic processes, its state space representation takes the 
form of the first order vector differential equation 
(2) dx/dt = Ax + Bu. 

Its solution consists of the zero input solution (solution of the homogeneous 
part with u = 0) and the zero-state solution which is the solution corresponding 
to zero initial condition x(0) = 0* Suppose that A is a constant matrix. Then 
the solution of (2) is given by 

(3) X(t) = e At x(0) + f e A(t_T) Bu(T)dT. 

J 0 

At At 

This can be readily verified by substitution and using the relation de /dt = Ae 
When A is not a constant matrix, we cannot write the solution as above. Instead 
we have 

X (t) = <j>(t, 0)x(0) + [ <J> (t, T)Bu(T)dT, 

J to 
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where <j)(t, t) is an (nxn) matrix called the fundamental solution matrix. It 
satisfies 



d<j>(t, T)/dt = A<j> (t, t) , 4> (t, t) = I. 

A (t— t) 

In other words, <|)(t, x) = e if A is constant. Otherwise, the explicit form 

of <j)(t, x) is not usually available. 

With a constant A, a change of variable shows that 



[ e A ^ t ” T ^Bu(x)dx = f e AT Bu(t-x)dx. 

h 

Ax 

We see that e B is the impulse response of the input X time earlier, i.e., 
dynamic multiplier of u(t-X) on X( t ) * It is called impulse response because 
a narrow pulse (impulse) 



f 1/e over It-x, t-x+e] 
u(t) = j 

0 elsewhere 
AX 

approximately gives rise to x(t) ~ e B - More generally (j)(t, X) is the impulse 
response function. 

Linear dynamic systems whose characteristic remains the same through time 
are called time-invariant or time-homogeneous systems. They are more conveniently 
handled using the Laplace transform. The Laplace transform of x(t) i s defined by 



x Cp) = x(t)e" pt dt 

J 0 

if the integral exists. For example, Laplace transforms are defined for all X(*) 
such that 

f oo 

X (t)dt < oo. 

0 

Laplace transforms of impulse functions are called transfer functions. 

With x(0) zero in (3), the Laplace transform of x(*) equals 



X(p) 



.. -pt A (t-X) . . _ 

dte * e Bu ( x ) dx 

0 -'O 
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= [ e AT Be ^ T dx [ u(v)e ^ V dv 

'0 Jo 

(4) =H(p)U(p), 

where U (p) is the Laplace transform of u(') and H(p) is the Laplace transform 

at 

of the impulse response function e B. (4) is said to "transfer" the effects 
of "input" onto the "output", hence the name transfer functions. 

We note that 

H(p) = (pI-A) _1 B 
= N(p)/D(p) 

where 

D (p) = |pI-A| 

is the characteristic polynomial of the matrix A. See Aoki [1976, p.45] for 
an algorithm for calculating the numerator N(p). The transfer function is a 
ratio of two polynomials, a rational transfer function. The zeros of the 
numerator polynomial are called zeros of the transfer function. The zeros of 
the denominator polynomial are the poles of H(p). A rational transfer function 
is stable if and only if all poles lie in the left half of the complex plane. 

A pole in the right half plane gives rise to an exponentially growing impulse 
response in magnitude. For stability reasons we exclude systems with such un- 
stable impulse responses from consideration. 

3 . 2 Inverse Systems 

What distinguishes two stable transfer functions whose zeros are mirror 
images of each other with respect to the imaginary axis? An example helps here. 

Let (p) = (p+1)/ (p+2) (p+3) and H (p) = (p-1) / (p+2) (p+3) . Note that H (p) differs 
from (p) by a factor (p-1)/ (p+1) . This factor has magnitude 1 for any p on the 
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imaginary axis because (p-l)/(p+l) = | (p-1) / (p+1) | e ^ ^ where (o>) = 2tan \) 

> 0. Therefore e* 3 ^, when applied to (p-1) /(p+1), produces or 

e ja)(t x (a)) ) w j iere i fa) = ^(o))/a). We call x (a)) (phase-) delay. H 2 (p) ^ as t ^ ie 
same magnitude as H^(p) . Its (phase-) delay, however, is larger than that of 
H^(p) by T (w) . For this reason we call transfer functions with zeros in the 
right half plane non-minimum phase transfer functions. The crucial point to 
note is that the inverse of H 2 (p) is unstable but l/H^p) is stable: In the 

example, l/H^ (p) = p + 6 + 12/(p-l) and l/H^p) = p + 4 + 4/ (p+1). The term 
12/(p-l) produces a divergent impulse response 12e** while 4/ (p+1) produces a 
convergent response 4e t . If a system has a minimum phase transfer function 
H (p) , then following it with another system, called the inverse system, with 
the transfer function 1/H(p) recovers the original impulse. Chapter 4 again 
takes up the inverse system. Our interest in inverse systems lies not so much 
in deterministic systems but rather in stochastic systems where the inverse 
systems are related to the notion of calculating innovation sequences (i.e., by 
whitening filters to convert a weakly stationary sequence with specified covar- 
iances to a white noise sequences) or in shaping filters to reproduce a given 
covariance sequence from a white noise sequence. We return to these topics in 
Chapter 10. 

3 . 3 Discrete-Time Sequences 



Economic time series are often subjected to one or more data processing 
procedures. These produce one or more sequences from a given one. Even when 
no causal or dynamic relations are implied or present, a stochastic sequence 
{y } with a complicated sequence of covariances may be conveniently represented 
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as the output of passing a white noise sequence, i.e., a sequence with a constant 
correlation coefficient, through a filter i.e., as the output of a dynamic system 
subject to white noises. 

Input-output sequences are related by the transfer function H(z) , with an 

argument z being L \ We can relate H(z) to the transfer function of some con- 

jO)T 

tinuous time dynamic system by setting z to e where T is a basic sampling 

period of the sequences, i.e., data are collected only at integer multiple of T. 

A stable H(z) has all its poles inside the unit circle. This fact has been 

discussed elsewhere. Just as transfer functions of continuous dynamic systems 

with the same magnitude can have different (phase) delay, by having their zeros 

in symmetric positions across the imaginary axis, the same variance-covariance 

structures can be represented by two transfer functions whose zeros are inside 

and outside the unit circle in the z-plane, respectively. 

Suppose y(n) = £ h(n-m)u(m) where we write u(n) to indicate the value of 
m 

u(t) at t = nT, with T being the period of data collection. The z-transforms 
of sequences are defined by 

CO oo 

Y (Z) = £ y (n) z n , H (Z) = £ g(n)z n and U(z) = £ u(n)z n . 

—00 _oo 

The convolution relation becomes 

Y (Z) = H (Z) U (Z) . 

This is the discrete- time version of (1) . We further discuss z-transforms in 

Appendix A. 5. The impulse response sequence {h n } is causal if h^ = 0 for negative 

n. For discrete- time causal dynamic systems the convolution expression becomes 
t «> 

y(t) = £ h(t-n)u(n) = £ h(n)u(t-n). 

n =-oo 0 

A transformation p = (1-z ^)/(l+z translates the results for the continuous 
and discrete time dynamics. This is a conformal mapping of a complex plane p 
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into another complex plane z. The imaginary axis in the p-plane becomes the unit 
circle |z| = 1. The region Re(p) < 0 is mapped into |z| < 1. In this formal way 
we associate two transfers functions h^(z) and h^(z) with zeros inside and outside 
the unti circle with a pair of transfer functions, H^(p) and H^tp) with the zeros 
in symmetric positions across the imaginary axis. As with the continuous transfer 
function, one with zero inside the unit circle produces a stable and, causal 
inverse. Because of the relation, (l-z)/(l+z) = (z ^-l)/(z ^+1) = -(1-z "S/(l+z "S 
the pair z and 1/z are the mirror images in the z-plane with respect to the unit 
circle, corresponding to the fact that p and -p are the mirror images about the 
imaginary axis in the p-plane. For example, h^(z)= (1+0. 5z ^)/(l+0.1z produces 
l/h^Cz) = (1+0. lz ^)/(l+0.5z "S . But h^z) = ^*)/(l+0.1z with a zero at 

z = -1/0.5 = -2 which is the mirror image of z = -0.5 relative to the circle 
|z| = 1, produces an unstable inverse l/h^(z) = (1+0. lz 1 )/(l+2z ^) . 




4 TIME SERIES REPRESENTATION 



Economic data are inherently noisy. We regard them as (discrete- time) sto- 
chastic processes, using the first and second order moments to characterize them. 
By removing known mean values from data {y }, the first moments can be taken to 
be zero. So we focus on the structure of second order moments. Thus, infor- 
mation contained in data is summarized by sequences of covariance matrices A 

t , s 

= E ^Y t Yg)* We attempt to duplicate the covariance matrix sequence of a given 
time series by solutions of a difference equation with zero-mean and finite 
covariance stochastic processes as inputs. 

When translation of input signals along the time axis merely translates 
the output time functions by the same amount and leave (first and second order) 
moments invariant then the process is called weakly stationary. Covariance 
matrices of weakly stationary processes A^_ depend on the time difference t-s 
rather than on t and s separately, as A fc g . Weakly stationary stochastic pro- 
cesses are modeled by time homogeneous dynamic equations. Time homogeneity and 
linearity alone tells us a lot about model dynamics as we have shown in Chapter 
3. Otherwise time series are nonstationary and modeled by difference equations 
with time-varying coefficients. (Dynamics whose properties change with trans- 
lation along the time axis are called time-varying or time inhomogeneous . ) 

To describe the future time paths of a time series we generally need the 
values of the current exogenous vector and values of one or more of the pre- 
determined, i.e., lagged endogenous and exogenous variables. How many of these 
predetermined variables do we need? That depends on the complexity of structures 
generating the time series, and must eventually be estimated from data. In 
Chapter 3 we explained that the notion of transfer functions is natural for 
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weakly stationary processes. A stationary output produced by the input e^^ 
is a constant multiple of the input 

y(t) = H(ja))e ja)t . 

This expression H(ja)), which is a complex number in general, is called the fre- 
quency response or transfer function and shows how the system responds to signals 
of different frequencies. The idea is that H(jo)) tells us how the effect of input 
is transferred onto the output. The above description is somewhat geared towards 
continuous- time system. However, the notion of transfer function is also natural 
in discrete- time processes. 

Earlier we spoke of using difference equations to represent time series. In 

economics and econometrics literature we often find difference equations written 

2 

using the lag operator L; we write y^ - 7y t _^ + 4y^_ 2 as d”7L+4L )y fc , for example 
Generally, the equation 

(1) <J>(L)y t = iKL)u t , 

for some (polynomial) functions <j>(L) and ip(L), is one of the most common represen- 
tations of the time series. When y^ and u^ are scalars, 4> (L) is a polynomial (of 
degree p) in the lag operator L and i|j(L) is another polynomial (of degree q) in L. 
The expressions such as $(L)y t are thus a convenient short-hand notation for a 
linear relation among y^, , y^ . The mode of representation associated 

with (1) is basically a reduced form in economics. The ratio ij; (L) /(j) (L) is the 
transfer function. When y and/or u are vectors, <j) and ijj generally become matrices 
where each element is a polynomial in L and (j) "Sp is the transfer function matrix. 
We use scalar valued time series to describe various models, and return later to 
vector-valued models. When the polynomial ip is 1, we call the model autoregres- 
sive (AR) . A model with <J) = 1 is called a moving average (MA) model. A 'generic' 
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model is a combination of these two, where neither (j) nor ifj is one. So the model 
is called autoregressive-moving average (ARMA) . When u contains non-random exogenous 
component, we sometimes speak of ARMAX models. 

A systematic or deterministic component of a time series is usually removed 
before further processing of data. One common way to remove a deterministic 
component is to difference data one or more times. When y is not the original 
data series but is a processed one by taking the difference of the original time 
series, we called the original model autoregressive-integrated moving average 
(ARIMA) model. 

Older books on economic dynamics such as Allen [1966] , Baumol [1970] and 
Gandolfo [1971] use this approach. They typically work their way up from models 
described by first order differential (or difference) equations, to models governed 
by the second order dynamics and finally to models of higher order dynamics. 

Dynamics are also introduced into { y when we carry out some data processing 
operations on them. The effects of such processing on y can also be conveniently 
expressed using lag polynomials or transfer functions in L in general. See 
Appendices A.l and A. 5 for a concise exposition of difference equations and z- 
transforms. 

The system literature favors the other approach, and uses the state space 
or Markovian representation, which expresses dynamics by a first order difference 
equation for an internal, or an auxiliary set of variables z called the state 
vector 



2 t+ i = Az t + Bu t 



state equation. 



and relate y ± and u to the state vector by 
t t 



CZt + Du^_ 



: output or data (observation) equation. 
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This representation is closer in spirit to structural forms than reduced forms 
in economics and econometrics. We can easily establish the equivalence of these 
two representations. 

An example may clarify the difference in representation. Suppose a stochastic 
process is described by 

(f>(L)y t = u fc 

where 

(f) (L) = 1 - a.L - a n L 2 - ... - a L p . 

1 z p 

Then the transfer function h(L) connects u^ to y by 
y t = h(L)u 

where 

h(L) = 1/ 4> (L ) . 



The same time series represented by the state space (or Markovian) model is 



2 t+ i = Az t + bu t 



where 



z ; = (y t- P+ i' y t- P+2 ' 



V' 



0 


1 


0 


•> 

0 




0 


0 


0 


1 


0 ... 0 




*• 










/ b = 


0 


0 


— 




1 — 1 
O 




1 












a. 


a 

v n 


— 


— 


a J 







y t = (° ■■■ o i)z t + u fc . 

We return to discuss general "conversion" methods for the two modes of represen- 
tations in the next chapter. 

One may wonder about the wisdom of this second, 'obviously' round-about 
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way of describing, time series. However, ease of expressing solutions of vector- 
valued first order difference equations definitely make this a worthwhile mode 
of representing time series. We later elaborate on this point in detail. State 
space representation of dynamic systems has several features to recommend it over 
the other, even though the latter is more familiar to economists and econometri- 
cians. One is the standardization of model representation achieved by this pro- 
cedure. Models are always stated as the first order difference equation for the 
state vector. Only the dimension of the state vectors vary from time series to 
time series. Although only practice and experience really brings home the superior 
nature of this representation for some purposes, it should be apparent that economy 
of thought is achieved, and that uniform representation facilitates development 
of solution algorithms. 

Models in either of the two modes of time series representation determine the 
time paths of the endogenous variables, given time paths of exogenous variables, 
i.e., the sequence {y^} is obtained as the solutions of appropriate difference 
equations with the exogenous variables as inputs. We now turn to the nonunique- 
ness of such representations. More than one difference equation driven by the 
same input stochastic processes yield the same covariance sequences as the data 
(y t ). A related question is this: When do we construct an AR model (j)(L)y^_ = u^_, 

and when an MA model y^_ = ijjCDu^? Or, is there any restriction on writing ij> (L) 
as l/(j) (L) ? 

Adequate answers to these questions depend on the locations of the zeros 
of the numerator polynomials of the transfer functions as we next demonstrate. 

The problem is related to the notion of the inverse system of Chapter 3. We 
call a dynamic system with the representation y^_ = h(L)u , the inverse of another 
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u = h(L)y^_,> because the roles of inputs or the forcing terms are reversed in 
these two dynamic systems. To be useful, these two dynamics must both be stable. 
This means that zeros of both the numerator polynomial and denominator polynomial 
must be stable. A simple example illustrates: Let two positive constants c^ 

and c^ define two sequences by 



and 



\ = c o e t - C i e t-1' 



n t = c ! e t - Vt-1' 



where 0 < c^ < c^, and {e } is a mean-zero white noise sequence such that 



E£ t = °' 



2~ 

eg = a 6 , 

t s t,s 



where 6 is one for t = s and zero otherwise. These two sequences have ident- 
t , s 



ical variances and covariances, var £ = var f)^ 



2 2 2 

( Cq +c )a and cov C ) 



= cov (ri t , = c o c i* °ther covariances are zero. Yet only one is 

causally invertible: 



(c cf c i z )e t 



£ t = 



-1 "t 



— {l+az 1 +a 2 z 2 + . ..}t , 
C 0 



since a = -c^/c^ ^ as magnitude less than 1. The same point can be made by a 
slightly different sequence r| = |p| > 1 is not causally invertible 

while C t = e t + I P I < 1 is * 

These two sequences become distinguishable when we examine their phase 
characteristics: Let T = 1 for simplicity. Define the transfer function 



= 



= /c 2 + 2c c 
VC 0 C 0 



t 2 jx(u) 
:^cos 0) + c^ e 



where 
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tan T (a)) = c^sin a)/(c 0 +c^cos oj) 



and 



3<^ = „ =3“ = / c 2 + 2c „ ^ „2 Jt(oj) 



H y (e > = C 1 ' c 0 e 



'o ' - O c i cos “ + °1 e 



where 



T (03) = c^sin w/Cc^+c^cos 03) . 



Clearly T (o)) > T (o)) , i.e., the transfer function for {ri^} is larger than that for 
{£ }. For continuous dynamics, a transfer function which is rational in the 
complex variable s is stable if and only if all zeros of the denominator lie 
in the left half of the s complex plane. A zero of the denominator in the right 
half plane gives rise to an impulse response exponentially growing in magnitude. 
For stability reasons we exclude systems with such unstable impulse responses 
from consideration. 




5 equivalence of arma and state space models 



This section explains how non-Mar kovi an models can be converted into 
Markovian models in simple and systematic ways. This conversion uses changes 
of variables that may at first sight appear to be arbitrary, but is in fact 
quite natural once the procedures are understood. 

Although quite simple, and possible in many different ways, the conversion 
procedures of this section incorporate some thoughts and care to achieve "minimal 
dimensional" Markovian models. Other seemingly simpler procedures may obtain 
state space models with too much redundant information or irrelevant information. 
Generally speaking, non-minimal dimensional models are to be avoided because 
such model representations may effectively prevent efficient optimization calcu- 
lations because of needless high dimensions. If informational redundancy is the 
only problem, non-minimal dimensional models are not to be frowned upon too much. 
However, they may suffer from other technical deficiencies not obvious at first. 
For example, algorithms for filtering, estimation and control are conventionally 
stated for minimal dimensional models and may break down or require special handl- 
ing if applied to nonminimal dimensional systems because technical conditions 
(such as positive-definiteness of matrices assumed in the algorithms) may be 
violated. 

We discuss only scalar systems because the question of minimal dimensional 
representation does not occur and enables us to concentrate on the conversion 
procedures. Our method generalizes to vector valued systems in a natural way. 

See Aoki [1981; Appendix A] for an example. We caution the reader that some 
other seemingly straightforward extension of the conversion procedures sometimes 
lead to inconvenient state space models. 
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5.1 AR Models 



Time series models can be put into state space form in several ways. 

It is probably most convenient to start with AR models. 

We use the letter L to denote a backward shift, i.e., Ly = y^ . 

Then an AR model y^ - a_y^ . - a 0 y^_ a y. = , can be put as 

t 1 t-1 2 J t-2 p t-p t-1 e 

(1) 0 (L)y t = U t-1 



where 

0 (L) = 1 - o^L - a 2 L 2 - ... - a^L P . 

Here y's and u's are taken to be scalar-valued. Introduce p auxiliary 



variables by setting 

\ (t) = Y t- P+ i' 
X 2 (t) = y t-p+2' 



Vl (t) = y t-l' 

x P (t) = v 

Advancing time by one, and by the above definitions we can write 
X x (t+1) = X 2 (t) , 

X 2 (t+1) = x 3 (t) , 



y t+1) = y t + l 



= a x y t + 



+ a y + u. 

p t-p+l t 



= a.jXpU) + ... + a x x (t) + u t . 

Define a (state) vector of these variables as x(t) 

Then x(t) evolves with time according to a first order difference equation. 



(X-L (t) , x p (t)) 



called the state transition equation 
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(2) 

where 



X(t+1) = Ax(t) + bu fc . 
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1 



and y^_ is related to x(t) by an algebric relation, called an observation or 
output equation 

(3) y t = (0 ... 0 l)x(t). 

The pair of equations (2) and (3) constitutes a state space representation 
of (1). 

If the right hand side of (1) is u^_ rather than then redefine X (b) 

to be y^_ - u^. The definitions for the other components of the state vector 
X remain the same. Equation (2) remains valid by redefining b to be b' = 

(0 . . . 0 1 a^) . Equation (3) is replaced by y^ = (0 0 ... l)x(t) + u . 

5.2 MA Models 



Let 



y t = 4> (D e t 



where 



ip(L) — 3 q + 3-^b + ... + 3^.1* 
Introduce state vector components by 

Xit = e t-r 



'•at - t-2' 



Xqt = £ t-q‘ 
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Then advancing t by one we note that 

Xit+i = V 

X 2t+1 = X lt' 



^q,t+l ^q-l,t* 
Hence the state vector 



^lt 



evolves with time according to 



^t+1 



0 


... 


... 


0 




1 


1 


0 


... 


0 


+ 


0 


0 


... 


1 


0 , 




0 



V 



and is related to the state vector by the equation 



(b i V x t + Vt- 



5.3 ARMA Models* 



Next consider a time series model described by 



(4) 



rt-i 

It can be put into a form analogous to (1) , 



a y = 3 n u + + — +3 _u. 

p t-p 0 t 1 t-1 p-1 t-p+1 



0(L)y t = i|j(L)u t/ 

where J2(L) is as defined above and we define 

iJj(L) = B 0 + e x L + ... + 3 L p_1 . 

Some of the 3’s may be zero. The state space representation of (4) can be 
obtained in two steps: First define v by 



See next page. 
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* In converting ARMA models, non-minimal dimensional models may arise. 
Several examples of this are found in Chow [1975] . In one such example he 
(p„153) turns a vector ARMA model 



y t = 



it y t-i + 



into a state space form 



A y + C x + 
mt t-m Ot t 



c +. x «. + 
nt t-n 



*t = Vt-1 + B t > 

by introducing a vector 



; t + Du t 
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y t 




Ot 


; 
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y t-m+l 
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X, 
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rt 

II 


t 
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• 
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*t-n+l ' 


1 


^ 0 J 



*t = 



and appropriate matrices A^_, and D. 

Clearly, X t is a state vector because its knowledge and a future time 

path of exogenous disturbances u's and a control time path x's eneables us 
to specify uniquely (the probability distributions) of the future values of 
X’s. Notice, however, a peculiar feature of his dynamic matrix; A^ contains 

a zero row submatrix, corresponding to the identity matrix of the B matrix, 
i.e., the matrix A^_ is of the form 



X ... X 


X 


. . . X 


X 

0 ... x 


0 


0 

... 0 


0 


X 


X 



where x marks nonzero submatrices. The state vector is not controllable 
in the sense we discuss later (see Aoki [1976; Chapter 3] , for example) , 
even though a subvector made up of y and its lagged values is controllable. 

Although nothing is theoretically wrong with this representation because 
the relevant part of x^_ is controllable it may be inconvenient to store 



redundant information in a computer, and standard algorithms for minimizing 
quadratic costs subject to linear dynamics may not apply without modifications. 
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(5) 



0(L)v t = u . 



Then (4) can be written as 
(6) Y t = ip (L) v fc . 

The sequence {v^} generated by (5) has the same form as (1) , hence can 
be put into state space form 

X ( t+1) = Ax(t) + bu t , 
v t = (0 ... 0 l)x(t) + u , 



(7) 



and 



by introducing the vector x(t) = (X^)/ . .., Xp^)' where X-j^t) = v t p + ± r 



VI 



(t) 



v. _ and x (t) = v^_ - u^. The state vector x(t) is next 
t-1 p t t v 



related to 



( 8 ) 



y t of (6) by 

y t = e o v t + 6 l v t-l + • • • + Vl V t-p+l 

= B 0 (X p (t) + u t ) + + ... + Bp.iX^t) 

= c' X (t) + B 0 u t 



where 



C' - (B p _ r 3 0 ) . 

Collecting (7) and (8) together, a state space representation of the ARMA 

model (4) becomes 

X(t+1) = Ax(t) + bu , 

(9) 

y t = c'X(t) + P 0 u t . 

There are other ways of putting ARMA models into state space form. 

We discuss two such representations; a controllable and an observable repre- 
sentation. 

For example, given 



y t = a i y t-i + *** + Vt-p + e 0 u t + 



+ $ U 

p t-p 



X x (t) 



u t* 



let 
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Then we can write the above 



aS X l (t) = L(a i Y t + ^l U t } + *** + LP(a p y t + ^pV' 



.-1 



L Xj/t) = X^t+l) = a^t + 6^ + X 2 <t), 

where we introduce a new variable by 

X 2 Ct) = L(a 2 y t + e 2 u t ) + ... + L P_1 Vt + B p U t K 



Eliminating y by the first equation, the above becomes 

X]_(t+1) = c* 1 x 1 (t) + (o^Bq + 3-,^) u t + X 2 (t) - 

Rewrite the definitional equation for X 2 ^ as 

L -1 X 2 (t) = X 2 (t+1) - a 2 y t + e 2 u t + x 3 (t) 

= a 2 X L (t) + (a 2 B 0 + B 2 )u t + x 3 <t>, 

where we introduce a new variable by 

X 3 (t) = L(a 3 y t + B 3 u t ) + ... + L p_2 (a p y t + B p u t > . 

Continue in this way until we reach 

x (t> = L(d y t + B u t ) 

or 

Xp ( t+ l) = a pXl (t) + (a p B 0 + B p )u t . 

Collecting X^t+l) , . .., X ( t+ D as a column vector, we define it as the 
state vector. Then it obeys the state transition equation 



( 9 ') 



x(t+D 



, (t+1) 



X (t+1) 

P 



Ax(t) + u ( t) j 



and the output equation becomes y 



(1 



o) X ft) + B 0 u t 



where 



1 0 



- • 

P-1 



a 0 
P 



and b 



°i B o +f5 i 



Vo + U 
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This representation is called an observable canonical form. 

Compare the alternative state space representation of (9) and (9'). 
The representation (9) has a simple b vector with a complicated c vector, 
while (9') has a complicated b vector and a simple c vector. The form (9) 
is called a controllable representation because the p-vectors b, Ab, ..., 
^b are easily seen to be linearly independent because they have the 
structure 



_p-l. 

A b = I 



Examples Second-order systems are used to give an example of observable 
canonical form and its dual or controllable canonical form. The precise 
sense in which these canonical forms are duals of each other is also indic- 
ated. Consider 

Y t = " a 2 y t-2 + e 0 U t + e iVl + e 2 U t-2- 

Write this as a nested sequence of lag operations : 

y t ~ S 0 u t = L{( ~ a i y t + e iV + L(_a 2 y t + e 2 U t )} - 



*l (t) = y t - e oV 

Substitute y t out by X-^t) + to yield 

+ Y x u t + L(-a 2 y + 3 2 u t ) 



Yi = 



where 




Letting X 2 ^ equal to L(-a^y + ^2 U t^ ' recalling that L 

X^(t + 1) we derive 

X x (t + 1 ) = -a^U) + Y x + u t + x 2 (t) 

and 

x 2 (t + 1 ) - -« 2 y + e 2 u t 



= -a 2Xl (t) + Y 2 u t 



where 



^2 ^2 ” a 2 ^ 0 * 



Collect the above two terms to form 
( 10 ) 

lX 2 (t + D. 



X x (t + 1) 





'-“l 1 




+ 


1 1 




-a 2 0 


-X 2 (t) J 


Iy 2 J 



V 



and 



(1 0 ) 



X x (t) 

lx,(t)J 



+ e 0 u t . 



This is an observable canonical form. 

The same system can be written as a sequence in L 

L " 2( y t - e 0 U t> + L " 1( “l y t ■ &1V + (a 2 y t - W = 0 



(L -2 + c^lT 1 + a 2 ) Cy t - = (Y^ -1 + Y 2 > V 



(L 2 + o^L -1 + a 2 ) (y 2 L + y 1 )x t = (Y^l” 1 + Y 2 >u t 



where we define x t by 



y 0 - e 0 U t = (y 2 L + W 



Thus 
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Rewriting the above as 

(YjL ” 1 + y 2 ){(l ~ 1 + a ± + a 2 L)x t “ u t^ = °' 

let 

L- X x t = -(a x + a 2 L)X t + u t . 

Redefine x t as X lt and let L X lt = X 2t * 



Then 



and 



Y t = (Y 1' X 2> 



^2t+l A lt' 



+ Vt 



<it + i = -Vit " V 2 t + V 



(11) 


x lt+l 




-V a 2 


*lt 




^ X 2t+b 




1 0 


bc 2t J 



and 



+ Wv 



Y t = (Y l , Y 2 ) 



v lt 

^2t 



+ B oV 



This relation is not surprising when we remember that optimal predictions 
involve minimization of the estimation error or orthogonal projection. We 



return to this topic later. 

This is an example of the controllable canonical form. Comparing (10) 
and (11) , we note that the dynamic matrices are transposes of each other, and 
the roles of b and c are interchanged. We call a dynamic system 

Xt+1 = Ax t + bu t 

y t - o’x t + du t 

a dual of another dynamic system 
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X t+ 1 = A ' X t + cu t 

y t = b-x t + *v 

This correspondence A' A, and b ^ c is the precise sense in which these 
two systems are dual. Intertemporal minimization of a quadratic cost sub- 
ject to a linear dynamics and the optimal one-step ahead predictor (Kalman 
filter) are the duals in this sense. See Appendix A. 14 also. 




6 DECOMPOSITION OF DATA INTO CYCLICAL AND GROWTH COMPONENTS 



6.1 Reference Paths and Vaxiational Dynamic Models 

Time series analysis often assumes zero-mean processes. To accommodate 
this assumption, non mean-zero components, especially secular growth of time 
series must be removed before we can begin. More often, economic time series 
are decomposed into three components; seasonal components, secular growth 
components and cyclical components or fluctuations about the secular growth 
paths. Structural information is then extracted from the remaining cyclical 
components or fluctuations about the growth paths to help predict future 
fluctuations or to discern patterns of cyclical co-movements of elements 
making up the time series, or better to characterize business cycles, and so 
forth. 

This section describes a systematic way for decomposing time series 
into the reference path component and fluctuations about them when well 
specified models producing time series are posited. 

This approach provides one benchmark analysis of time series. The 
other benchmark approach does not presuppose any such well-specified models 
for secular growth paths but merely extracts smoothly varying paths subject 
or some statistical regularity conditions. 

Either method of decomposition will leave mean-zero, finite-variance and 
weakly stationary time series as our object to study. If such time series are 
given to us to begin with, we can of course dispense with this preliminary 
phase of data processing and directly construct Markovian (or state-space) 
models or their equivalent ARMA models. This aspect is discussed later. 

How to decompose a given time series theory is a theoretical and 
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numerical question sometimes addressed by time series analysts. See Akaike 
[1980] or Shiskin and Plewes [1978] on seasonal adjustment for example. 

After seasonal variations are removed, economic time series 'are then decom- 
posed into secular growth components and cyclical fluctuations. This de- 
composition can of course be done simultaneously. Some time series are 
published in already seasonally adjusted, forms. 

We shall use the word "reference" instead of (secular) growth to denote 
non-zero mean components, and speak of decomposing time series into the 
reference paths and fluctuations or variations about the reference paths. 

What constitutes reference paths largely depends on the amount of structural 
or theoretical information we possess or wish to bring to models that are 
producing the time series in question. 

In an extreme instance, a balanced growth path of a neoclassical 
macroeconomic model with . constant or variable growth rates may be used as 
the reference path.* In cases where exogenous variables are present, then 
their conditionally expected values may be used together with the hypothesized 
(macro) economic models to define a set of reference time paths for endogenous 
variables of the models. In these cases fluctuations about the reference 
paths can be described by variational models derived from the original models 
that define or produce the reference time paths. This view is important and 
useful because detrended log-linear models often used in econometrics can be 
justified precisely this way. We return to this point and develop it further 
shortly. 

The other extreme attitude adopted by some practitioners eschews any 
such theoretical construct of the economy as unjustified and unwarranted in 



Aoki [1980] is one such example. 
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view of existing economic theory. The only distinguishing features, then, 
between the growth and cyclical components are the relative frequencies 
involved. Growth components, for example, are "known" to vary much more 
'smoothly' than cyclical components. Operationally, this distinction may 
be made by an (arbitrary) assumption on behavior of some higher order dif- 
ference (such as the second order) of growth paths being randomly varying 
with variances comparable with those for the cyclical components. Hodrick 
and Prescott [1981] take such an approach, for example. 

6. 2 Log-linear Models as Variational Models 

We now show that if a model is specified that produces a reference 
time path, then the model for fluctuations around it is the same as its 
detrended, log- linear model. (Such a model is called the variational model 
in the systems literature.) See Aoki [1981; Chapter 2] for additional 
details on the variational models. We first establish this connection of 
the variational models producing cyclical or fluctuating movements about 
the reference time paths with familiar log-linear models. 

Log-linear economic models arise in at least two ways; one is familiar 
to economists while the other seems less so. 

An example will illustrate the difference. Consider a money demand 
function specified in product form as 
(+) M/P = e -ai Y 6 . 

Define m = £nM, p = £nP and y = £nY. Take the logarithm of (+) to obtain 
m - p = -ai + 3y* 

This is indeed a familiar demand for real balances specified to be linear 
in logarithms of variables, except for the interest rate i. 
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If the variables are measured as deviations from some reference (long- 
run equilibrium, for example), we can redefine the variables: 
m = £n(M/M Q ), p = £n(P/P^) and y = £n(Y/Y Q ) 
where the subscript "0" denotes reference. Now, because £nM - £nM^ = £n(M/M^) 
etc., the lower case variables measure deviations from the reference. Note, 
however, that this approach works only for functional relations which are 
specified in product form. 

This approach can not handle relations such as Y = H + I even when the 
variables are measured as deviations from a reference, i.e., y = ilnCY/Y^) = 

(H + I) 0 ' 

Now the second approach (which is the one in Chapter 2 of Aoki [1981] ) 
is used: Suppose Y = H + I and define lower case letters by 

Y = Y (1 + y), H = H q (1 + h), I = I Q (1 + r). 

Then 

Y q (1 + y) = H q (1 + h) + I Q (1 + r) 
or because Y^ = H Q + 1^, we can write this equation as 

V = H o h + V 

or 

(*) y = ( H c/ Y o )h + <V Y)r * 

By definition, £n(y/y ) = £n(l + y) - y, £n(H/H ) = £n(l + h) - h, and 

u u 

£n(I/lQ) = £n(l + r) - r, i.e., (*) is the log-linear model resulting from 

the variational approach, and the lower case variables measure deviations 
of the logarithms of the corresponding upper case letters. 

This way of deriving log-linear relations from the variational approach 
is thus more general because it does not require any specific functional form. 
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Another example may make this point clear. One way to specify a con- 
sumer price index of a small open economy is to posit 
P J = p a (EP*/P) 1 ” a 

where a is the fraction of domestic goods in the consumption bundle. Its 
log- linear version is 

(#) Pj = ap + (1 - a) (e + p* - p) . 

It is not necessary, however, to adopt such a specific functional form 
to produce log- linear relation. Just specify an arbitrary functional relation 
with usual sign restrictions 
(0) P ][ = F (P , EP*/P) . 

Its reference value is 

(P I ) Q = F(P Q , (EP*/P ) q) • 

Now define 

Pj = (PjJqCI + Pj), p = P Q (1 + p), p* = P*( 1 + p*) and E = E Q (1 + e) . 
Then retaining only the first order terms in the Taylor expansion of (0) 
as the lower case variables, we deduce that the variational variables are 
related by 

p^ = ap + b(P* + e - p) 

where a and b are elasticities evaluated on the reference paths and are given 
as a OF/9P) 0 /(P 0 /F 0 ) and b = {— 3 ( EP */ P ) P ^ 0 ' 

The variational model of a growth model will produce a secular growth 
or trend time path. Many examples are worked out in Aoki [1981] . 




7 PREDICTION OF TIME SERIES 



7.1 Prediction Space 



The covariance matrix of a stacked data vector (y^, y' , ... y^) * of a mean- 
zero weakly stationary process {y^} has a special structure: A submatrix A^ = 

Ey^y^ is located along the main diagonal, the matrix A^ = Ey^ + ^y^ along the £-th 
diagonal below the main diagonal, and A_^ = Ey.jyj^ + ^ = ^ along the £-th diagonal 
above the main diagonal. This covariance matrix is a block Toeplitz matrix 
because the same submatrices are arranged in the same way that elements are ar- 
ranged in Toeplitz matrices . 

When we relate stacked predicted future vectors y^. +s | t # s = 1, ... to the 

current and past exogenous noise vectors, we obtain another matrix of special 

structure (not unrelated to Toeplitz matrices as discussed in Appendix A. 13) , 

called a Hankel matrix. More importantly perhaps, Hankel matrices naturally 

arise in our attempt to predict the future realization of {y^_} and in constructing 

its Markovian model by calculating the covariance matrix between a stacked data 

vector (y*, y' , ..., y ' ) ' and stacked future realizations (y ' - , ..., y ' ) ' 

t t— JL t— N * t+X t+K 

for some positive N and K. Hankel matrices also arise in several other contexts 
as well; in calculating dynamic multipliers or impulse responses, in approximating 
impulse responses by those of some low-order dynamics, and in some identification 
conditions . 

Use of an impulse response sequence or an MA form is one way to state the 
dynamic response of a discrete- time system disturbed by exogenous impulses 
(shocks) . Let a (matrix) sequence {H^} relates y^ to current and past shocks by 

U) y t = Vt + H i e t-1 + Vt-2 + ••• 



where y is the current observation and e's are the exogenous shocks. 
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Suppose that a Markovian model of {y^} is given by 
Xt+l = Ax t + Be t' 



(2) 



Y t = cx t + De t - 



The z- transform of (2) easily relates y's to e's which is the z-transform 
counterpart of (1) 

y(z) = H(z)U(z) 

where 



U ( z ) = Z e. z 
k 



y(z) = Z y h z h . 



and 



H (z) = Z Hz 

= C(zI-A) _1 B + D. 

We recognize the last to be the transfer function of the dynamics (2) . 

The impulse response (matrix) fh is the dynamic multiplier (matrix) which 
measures the effect of a past action or disturbance (vector) on the current 
data (vector) y . Appendix A. 18 further discusses the multipliers. From (3) 
we can state H. in the system parameters: 



(3) 



CA 1 ~ 1 B / i > 1, 



D, 



i = 0. 



The H's are called Markov parameters. If is a weakly stationary mean-zero 

white noise sequence, i.e., Ec = 0, E£ £' = 6^ I, then H. can also be inter- 

t t s ts 1 

preted as the covariance between y^ and the disturbance i periods earlier 

H i = E(y t e -u>- 

In a multivariate AEMA model, the exogenous and endogenous variables are 
related by 

<f>(L)y t = ip(L)e t 

where <j)(L) and ip(L) are matrices of polynomials, i.e., <J> (L) = A ^ L 1 and \jj( L) 
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B^L 1 , where and B^ are matrices of appropriate dimensions. In other words, 
each element of <j)(L) and ip(L) are polynomials in L. Formally inverting <j>(L), Y t 
and are related by 

y t = H(L)e t 

where 

H (L) = (J)(L) _1 ^(L) 



= Z H.L 1 
0 1 

is the transfer function. (It is called the left matrix fraction description 
(MFD) of the transfer function. ) The transfer function is causal if F = 0 for 
negative i and Hh( 0)H is finite. 

Now, advance time in (1) successively to write y _ , y , . . . and obtain 

t+1 t+2 

their conditional expectation (orthogonal projection onto the subspace spanned 

by *••) expressed as a linear combination of the current and past e's. 

In the notes we use the notation y^ , . i to denote the conditional mean of y 

t+i I t 1 t+i 

given the information available at time t, i.e., e^, ..., i.e., 

y t + i|t = E(y t +i l ••• ) = H i e t + Wt-1 + •••' 1 = 2 

When these predictions of future observations are stacked into an infinite dimen- 
sional column vector, this vector is related to the stacked conditioning vectors 
by the (infinite) matrix ft. 



' y t + llt ' 




e t 


y t+2 | t 


= H 


e t-l 






• 



where 
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K 



H 1 H 2 



H 2 H 3 



H 

H 



3 *** 

4 * * ’ 



This matrix has the same submatrix H_^ along counter diagonal lines (lines runn- 
ing from lower left to upper right, perpendicular to diagonal lines) . A matrix 
with this feature is called a Hankel matrix. So we call H a block Hankel matrix 

We later use another block Hankel matrix in which the submatrices H. are not 

i 

the impulse response matrices but are covariance matrices Ey £ = 1, 2, ... 

When we stack only a finite number of predictions y t+s | t ' s = 1, 2, . .., N, 
then we note that an upper left-hand corner of H 



[ «N H 2N-1 j 

relates y L .. i , through y^,„i to £ through £ „ If y^ is p-dimensional 

J t+1 1 1 ^ t+N 1 1 t * t-N+1 t 

then is an (Np x Np) matrix. 

From the definitional relations in (4) we observe that H is a product of 
two semi- infinite matrices 



C 

CA 

CA 2 



and C = IB, AB, ..., A N ~ 1 B ...] 



The (N x N) matrix H is the product of 0„ 
N N 



and t , 
N 



two finite submatrices of 



0 and C: 
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(5) tt N " ® N V 

These two matrices are exactly the observability and controllability 

matrices respectively so important in system theory bacause they appear as 

important technical conditions in many optimization and filtering problems. 

See Aoki [1976] for details on these matrices. 

Now, if (2) is a correct minimal-dimensional state space representation of {y } , 

then the rank of both 0„ and are equal to the dimension of the state vector Yj. * 

N N t 

Let n = dimX t * We know from system theory that the ranks of 0^ and are at 

most n. From this fact and the relation H = 0 C . we conclude that rank H is 

N N N N 

at most n. The rank is exactly n once Np >_ n, for minimal dimensional state 
space models, i.e., if (2) is controllable and observable. The regular pattern 
of submatrices in H tells us that the (row) rank of H can not be larger than 
that of H n for some suitable N. The row rank of therefore tells us the 
dimension of a state vector which can represent {y^} via a state space model 
(2) . The state vector dimension need not be an integer multiple of the dimension 
of y. Just because a component of y , say the third component, enters the state 
vector does not mean that the third component of y^_ ^ also enters into the state 
vector. We later discuss in detail choices of basis vectors to span the row 
space of Ur i.e., to span the predictor space. 

When we calculate covariance between a stacked data vector (y^, y^_ ..., 

y t-N+l) ' and the future vector (y^ +1 , *••/ y^. +N ) / we obtain an important example 
of the Hankel matrix 







f Yt+1 ] 


ly t' y ;.r • - y ;- N+ i j 


[ A 1 A 2 • 


\ l 










A 


.. A 


(6) 


E 


- 


= 


2 


J N+l 






^ y t+N- 




\ 


■■ A 2N-X- 



This matrix is important for us because we construct a state-space model of the 
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time series {y } by operating on this and related Hankel matrices. To relate 
this Hankel matrix to the one we just discussed, suppose that {y^} is modeled by 

f X t + l = AX t + Fe t 



(7) 



[ Y t = C *t + £ t 

where is the usual mean- zero, serially uncorrelated exogenous process. 

(We later obtain a model of this type by constructing a Kalman filter in Chapter 



10 .) 

Then the covariance submatrix is equal to Ey^y^ by the weak stationarity 
of the y^ process. On the assumption that Xq is uncorrelated with e^, hence 
with all t > 0 as well, and by solving (7) y fc = + C(A t X^ + 1 T F£^_) 

the covariance has the structure* 

(8) A = Ey y^ = CA t_1 M, 

where 



M = A1TC' + F 

and 

* = ex o x o = Ex t x ;- 

The last equality follows if A is a stable matrix because ix t ) will then 
be weakly stationary. 

Comparing (8) with (4) , we note that the Hankel matrix of (6) can also be 

factored as shown by (5) because A's and H's have the same structure. The 

matrix B is simply replaced with M in i.e., when C of (5) is replaced 
N-l 

with (M, AM, ...-, A M) . Before we turn to the important topic of estimating 
A, C and M, we note an invariance property: The rank of H is invariant with 
respect to a similarity transformation, i.e., an equivalent choice of another 
state vector does not alter the rank of ft. This is easy to verify. 



* Under some technical conditions, sample covariance matrices (1/K)S 
converge to E (y t+ ^y^) as K °°. 



K-l 

t=0 y t+£ y t 




44 



7.2 Equivalence 



Suppose we are presented with two Markovian models 



(S) 



s t+l = Az t + B v 



y t = Cz t + Dx t , 



and 



(S*) 



Vi = + Gx t' 



y t = Hw t + Dx fc . 



In addition, we are told that the state vectors z and w are related by a 
nonsingular transformation T 



Then these two models are different representations of the same dynamics with 
respect to two different coordinate systems if the matrices satisfy the rela- 
tions 

F = T^AT, 

G = T^B, 

and H = CT. 

When the transfer matrices of these two models are examined, they are equal 
because 

-1 -1 -1 -1 
D + H(zl - F) G = D + CT(zI - T AT) T B 

= D + C(zl - A) _1 B. 

Alternatively, from (4) we can say that these two equivalent systems possess 
the same set of Markov parameters 

HF 1 G = (CT) (T _1 A" L T) (T _1 B) = CA X B, i = 0, 1, ... 

We call (S) and ( S* ) equivalent if such a nonsingular transformation T exists 
and write S ~ S* using to denote equivalence. Hence the entries of the 
Hankel matrices are the same for equivalent model representations. 
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7.3 Cholesky Decomposition and Innovations 



Wishing to predict future realizations of a time series is one of the 

basic reasons for analyzing time series. The inverse system we introduced 

in Chapter 3 is linked to a special and easy case of prediction which we mention 

in passing. We give a more detailed exposition later in several related chapters 

Suppose {y^} is a scalar-valued weakly stationary process. Predicting 

future y's from the data set, y^, y^, ..., y^, becomes particularly simple if 

we can express y's in an MA form 

y t = ip(L)e^, where ty(L) = 1 + + ... + $^L g , 

and {e^} is a serially uncorrelated mean zero weakly stationary process with 

varience G 2 . This is because E ^Y t l e t _ 1 ' e 1 ) is given by 3 -^ ± + $ 2 & t 2 + ” 

+ 3 e , if we know e_ , e . ..., e^_ _. How do we obtain these e's? They can 
q t-q 1 2 t-1 

be obtained by factoring the covariance matrix of the data vector (y^ , y , . . . / 
y fc _ 1 ) ' . Later we show that e's can also be generated by Kalman filters. 

Let Z be the covariance matrix of a stacked vector z = (y^, .../ y ) ', i.e., 

2 

Z = E(zz'). Factoring Z into the product form G CC' where the matrix C is a 
lower triangular matrix with ones along the main diagonal, we can represent 
y's as a linear combination of e's. 



z = Cu 

where u = (e^ ..., e )', because E(zz') = CE(uu')C' = o 2 CC = Z. 

This factorization of the covariance matrix expresses y's as a linear 
combination of uncorrelated disturbances. Conversely, the uncorrelated shocks 
can be constructed from y's by inverting the matrix C 



k=l 



a jk Y k 
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where the matrix A = (a..) is the inverse of the matrix C. Note that this is 
Ik 

in an AR form. We can check that e's thus constructed are serially uncorrelated. 

This factorization, called the Cholesky decomposition, is exactly the same 
as the Gram-Schmidt orthogonalization procedure frequently used in nonlinear 
programming algorithms. In the geometry of the mean- zero random variables with 
finite covariances, this analogy becomes exact. The Cholesky factorization 
orthogonalizes the data vectors into uncorrelated noise vectors. Let us 
rephrase this fact using the notion of "innovations" because we frequently 
deal with this notion in later sections, especially in discussing Kalman filters 
and estimating model parameters from data. A set of N independent (mean-zero finite 
variance) vectors i = 1, . . . , N is called innovations of the set of (mean-zero 
finite variance) data vectors, y^ , i = 1, ..., N, if for any k, the G-field gen- 
erated by y^, i = 1, ..., k is identical to that generated by y^, i = 1, ..., k. 

In the geometric language of the Hilbert space, the subspaces spanned by y , i = 1, 
..., k and y^, i = 1, ..., k coincide. Intuitively, a set of innovation vectors 
carries the same amount of information as that contained in a set of data vectors. 

Such innovation vectors don't always exist for general data vectors. For 
Gaussian random vectors, however, the innovation vectors always exist. We construct 
them by the Gram-Schmidt, or if you prefer, the Cholesky decomposition method: 



y i = y i 



y i = y i - E (y i |y 1 ...y i _ 1 ) , i = 2, n. 



For Gaussian vectors, we know that y_^ is independent of * *y^ ]_' a ^- so know 

that the conditional expectations of random vectors that are jointly Gaussian 

are linear in the conditioning vectors. We thus write the above as y = y — 

J i J i 

v i-l 

L . a. ,y . , or 
1=1 1 TJ 
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" a 21 1 



l? N i l ^1 * * • a NN- 1 1 Y N' 



Because this block lower triangular matrix is nonsingular, the data vectors are 
expressible as linear combinations of the innovation vectors. For i > j , we 
calculate = E tE I Y i* * * Y i~i ) ] to see that ^ is zero * Here Y j is 

measurable with respect to the D-field generated by Y^* * ’Yi-i' Yj ^ es 

on the subspace spanned by y . ..y. by construction. Hence E (y . y \ | y , . . .y . 1 ) 

= E(y,|y^...y^ m y\ = E(yjy^ = 0. Here, we use the independence of y^ and 
Y l*** Y i 1* s i m il arl Y for the case where i < j. For Gaussian vectors, uncor- 
relatedness is equivalent to independence. Thus, y^, i = 1, ..., N, are in- 
novations as was to be proved.* 

As we noted above, the prediction is easy with this form. We have the 
representation 



y t+i = e t+i + c t+i,t e t + ••• + c t+i,iV 

Denoting the conditional mean of Y t+1 ' given e^, ..., e^ by Y t+1 | t we can write 

y t+i|t = c t+i,t e t + ••• + c t+i,i e r 

For weakly stationary processes the coefficients should depend only on the time 



differences, i.e., c .. becomes c _ .We return to this and other points 
t+1 , s t+l-s ^ 

later. 



* The Cholesky decomposition, however, suffers from computational problems. 

It is a linearly convergent algorithm and does not converge fast near the solu- 
tion, see Pagano 11976] . Quadratically convergent algorithms for solving the 

algebraic Biccati equations are reported in the systems literature. They can 
be applied to provide efficient factorization algorithms for the spectrum. We 
return to this topic elsewhere. See Hewer 11971] or Molinari [1975] also. 




8 SPECTRUM MID COVARIANCES 



8.1 Covariance and Spectrum 



The discrete Fourier transform of a finite data of a real mean zero time 

series with a regular sampling interval T, is defined by 

31 n_ ^ N-l . /wn 

X (00) = Z x 
0 n 

where NT is the total time span covered by data points , x^ , . . . , . It 

is the same as the truncated z- transform when we let z = e"*^. Because x is 

n 

a random variable, so is X(u)). 

Its covariance is calculated to be 



(1) 



E(X(0))X(03) * ) = £ £ R(n-m)e“ ja3(n ~ m)T 
n m 
N-l 

= X (N-|u|)R(u)e-^ UT , 
U=- (N-l) 



where 



Ex x 1 = R(n-m) . 
n m 

Note that Ex^x^ = R(m-n) = R' (n-m) . Divide (1) by N and let N goes to infinity. 
Define the limit S (W) as 

S(U)) = lim E(X(0))X(03) ') 

N-*x> N 



R(u) e 



- joauT 



This is called the (power) spectrum or the spectral density of the time series 




This equation is of the form of a Fourier series expansion, hence by the 
inversion formula for the Fourier transform we recover R(k) by 

R( k) = — [ S(co)e jWkT du. 

J -tj 

Of particular interest is the covariance of x 

i r° 

EXqXq = R(0) = — s(aj)dw. 

-TT 

The transform of the covariance sequence (which is also called the 
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covariance generating function) is defined by 

CO 

S (z) = £ R(n) z n . 

_00 

We recognize that the spectral density is obtained by setting z = e” 1 ^ in 
the above. The assumption that £ R(k)^ < 00 ensures the SCe^^) is well- 
defined in mean-square sense.* 

From the relation R(-n) = R' (n) , we note that the z- transform of the 
covariances or the covariance generating function satisfy a relation 

CO 

S(z _1 ) = £ R(n)z n 

_co 

00 

= £ r' (-n) z 11 
_00 

= s' (z) . 

Alternatively, a covariance generating function can be written in a sum form 

S (z) = G (z) + G(z -1 ) 

where Re GCe" 1 ) >0. The function S(z) is positive on |z| = 1. When R's 
iOJT 

are scalar, G(e ) is even in U): 

S(e jU)T ) = R(0) + Z R(n)e- j “ nT + T R(n)e~ ja)nT 
1 -1 

= R(0) + £ R(n)e -jUnT + £ R(-n)e jU)nT 
1 1 

CO 

= R(0) + 2 £ R (n) cos(x)nT. 

1 

To summarize, a spectrum S(z) is the z- transform of a covariance sequence. 
Theoretically it satisfies the next three properties: 



* The literature often uses the covariance generating function defined by 

S(x) = R(n)x n . 

Here x is merely a place marker with no instrinsic meaning. The z-transform 
is the generating function when x is identified with z ^ . 
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S' (1/z) = S(z) (sometimes called parahermintian) 

Analytic on |z| =1, 



(iii) S(z) > 0 on |z| = 1. ‘ 

The sum form of S(z) shows that (i) is true. Functions satisfying (ii) 
and (iii) are called positive real functions. 

Now regard a mean-zero stationary stochastic process y(t) as being 
outputs of a time- invariant linear causal dynamic system with another mean- 
zero stationary stochastic process u(t) as inputs: 

CO 

y = Z hx(t-n) 
t n 

n=0 

where {h^} is the impulse response sequence of this dynamics. By causality 
h^ is zero for all negative n. The discrete transfer function H(z) is given 



as the one-sided z- transform 



H(z) = Z h z 

o n 



H(e 3 “ T ) = Z h e- j “ nT . 
0 n 



Easy calculations show that the output covariance matrix is given by 

(2) R yy (k) = E ^ (t+k) y' 

= E[{ Z h(m) x(t+k-m) }{ Z h(£)x(t-£) }] 1 
m=0 £=0 

= Z Z h(m)R (k+£-m)h' (£) , 
ml X 

where R^(n) is the input covariance matrix, and R^(k) that of output. 

By definition the spectrum of the y series equals 



S (03) = Z R (k)e 



( 3 ) 
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and that of the x series is given by 



(4) 



S (U3) = I R (n) e 



-j£i)nT 



Substitute (2) into (3) and use (4) to rewrite the spectral density of y 



in terms of that of as 



CO oo oc 



(5) 



S (oo) = £ £ £ h (m) R (T+£-m)h' (£) e 

yy _ - - n - x 



-jU)TT 



T=~°o m=o £=0 

OO OO CO 

v , . , -joamT v . n -ju)(T+£-m)T v l/0 . ju)£T 

= £ h(m)e J £ R (T+£-m) e £ h'(£)e J 

m=0 x =_oo £=0 

= H(e j “ T )S x (ai)H' (e“ ja)T ) . 

This important equation relates the spectral densities of the output 
and input via the transfer function. It is a form of spectral factorization 
results. Let the variable z correspond with e" 1 ^. Then we can factor the 
spectrum thus 



(6) S (z) = H(z)S (z)H(z 1 )*, 

yy x 

where * denotes conjugate transpose. 

A serially uncorrelated input sequence is called a white noise sequence, 

i.e., R^ = EX n+ ^x^ = 0 for £ f 0. Then the spectrum S^(z) is a constant 

independent of z. The spectral density of a dynamic system with a white 

noise sequence as input can be factored, then, as H(z)EH(z where Z is 

the noise covariance matrix, Ex x 1 = Z. 

n n 



8 . 2 Spectral Factorization 

The previous section calculates the spectrum of { y ^3* , given its model 
dynamics or its transfer function, and a mean-zero white noise sequence as 
its input. The spectral factorization can be thought to be the converse 
process of generating {y^} as the output of a linear dynamic system driven 
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by white noise, given the spectrum, or equivalently the covariance generating 
function of the y-process. 

From the previous section we know that the covariance or the correlation 
coefficient of a real-valued process {y } is real and even, and that the co- 
variance generating function is made up of the sum of a function G ( * ) evaluated 
at z as well as at z 1 , S(z) = G(z) + G(z 1 ) , hence 3(z) = S(z 1 ) , and that 
S (z) > 0 on | z | =1 becuase the Toeplitz matrix is positive semi- 

definite for any m, 1 i, j <_ m. Because the coefficients are real, zeros 
of S(z) are either real or occur in complex conjugate pairs in the complex 
z-domain. In addition, because S(z) equals S(z "S , if z = z , is a zero, so 
is z \ The zeros of S(z) hence occur in fours, with the possible exception 
of zeros that are exactly on the unit circle |z| = 1. The latter occurs in 
twos (complex conjugate pairs) unless z = ±1. By collecting appropriate factors, 
then, we can factor S(z) in a form corresponding to (6) 

S(z) = W(z)W* (z' 1 ) 

where W(z) collects all zero lying in |z| <_ 1. The zero on |z| = 1 are 
equally allocated to W(z) and W' (z "*j . This is the scalar version of the 
Spectral Factorization theorem. A basic result for vector-valued process 
is : 

Theorem Let S(z) be a real rational full rank covariance generating function. 
Then it can be factored as S(z) = W(z)ZW*(z) where W(z) is real, rational, 
stable, of minimum phase, and £* = £ > 0. 

Consequently, W 1 (z) is analytic in |z| > 1. We can then write for |z| > 1, 
W (z) = ^qC^z , where the Taylor series exapnsion is valid in |z| > 1. The 
matrix W ^(z) is the z- transform of a stable causal (one-sided) dynamics, hence 
W 1 (z) is a causally stable dynamic system called whitening filter, and = 
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w""^(z)y is the input white noise. We have already mentioned another way 
to factor a spectrum by Cholesky factorization of the covariance matrix. 

A third way, to be discussed subsequently, is to generate innovation sequences 
by Kalman filters and factors spectrum accordingly. 

Let 



S(z) 



m 

Z 



V 



-h 



Then 

m 

in . . v ^ -r 
z S (z) = 1 R , z 

^ m+r 
2m 

2m 

= £ ^z : poly nominal of degree 2m. 

A stochastic process taking its value in a real Euclidean space has 
-jO) -1 

a real-valued spectrum S(e ), i.e, S(z)* = S'(z ). So if z^ is a zero 

of S(z), then so is z 1 \ if z. is real. If z. is complex, then z* is also 
k k k ^ k 

a zero. 



Let Y*} be a set of complex roots of m m S(z) = 0 with |y^| > 1 and 

also containing half of those roots |y^| = 1 . Let p_. be a real root. Then 
we can factor z m s(z) as 

m ( h ^ ) f k -1 -1^-1 

z m S(z) = const tf (z - y ) (z - y*) tt (z - y.) tt (z - y ) (z - y* ) tt (z - p. ) 

W k k j=i 3 •> w=i k j=i 3 

where 



2k +SL = m. 

h — *| 

Let z$(z) = TT (z - y ) (z - y * ) TT . _ (z - p.) . Then noting that z 
k — 1 k k 3 — 1 3 

-y = i/z - y= l-yz/z =y ^-z/z. or z[z ^ - y] = - (z - y = -z(z ^ - y) , 

we see that 3 (z) has no zero in z| < 1 . If there is no y or p . of modulus 

ii 'n 3 

1, then $(z) has no zero in |z| <_ 1, i.e., is a z-transform of a strictly 
minimum delay i.e., minimum phase filter. 
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System theoretic construction provides an alternative to direct spectral 
factorization of covariance sequences. The covariance generating functions 
S(z) are naturally given as a sum S(z) = G(z) + G' (z"" 1 ) because of = R* . 

Its spectral factorization expresses it as S(z) = W(z)W*(z~ 1 ) where W(z) 
is analytic in |z| :> 1 and of minimum phase, i.e., has zero inside the unit 
circle, and rank W(z) = r in |z| > 1 if rank S(z) = r. The matrix W(z) , 
called the spectral factor, is unique up to left multiplication by an orthogonal, 
real-valued constant matrix. The function G(z) is called positive real in the 
systems literature if it is analytic in |z| >_ 1, G(z) + G' (z) >_ 0 and G(°°) 
is finite. 



We now describe an algorithm for calculating the spectral factor due 
to Anderson et al. , [1974] . One of the system theoretic results on positive 
realness is that G(z) is positive real if there exists a symmetric positive 
semi-definite matrix P such that 



(7) 




APA'-P, 


APC'+r 




M (P) = 




> 0 






JAPC'+F) * , 


CPC* +21, 


where (A, F, 


C) is a minimal 


realization 


of G(z) , i.e. 



G (z) = I + C (zl - A) , 

where rank (T, aI 1 , ) = r = rank(C', A'C', ). 

This is easily established. Suppose such a P exists. Then factor 
M CP) as 



( 8 ) 

and construct 



M(P) 



r nr 1 i] 

I 



(9) W(z) = I + C (zl - A) . 

We can show that W(z)EW' (z ) = G(z) + G' (z ) by straightforward substitution 
when TE, EF and rET' are substituted out by the corresponding expressions from 
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(7) . 

To recapitulate: Let S(z) be a rational spectrum with full rank for 

almost all z. It is given by E ^A^z ^ where = D and A^= CA^ ^M, £ > 1. 

Then the spectral factrization theorem tells us that S(z) can be uniquely 

factored as W(z)EW (z” 1 ) , E = E' >0 where W(z) has all poles and all zeros 

inside the unit disc, |z| <1/ i.e./ the poles of W (z) are also all inside 

the unit disc and lim W(z) = I. The spectrum S(z) can also be written as 
z-*°° 

S (z) = G (z) + G' (z” 1 ) . By construction of S(z), the matrix G(z) is readily 
given by D/2 + C(zl - A) which is realized as the transfer function of 
the innovation model 



A spectral factor W(z) 
dynamic model 



z t+i = Az t + M V 



y t = Cz t + iv 

of S(z) is realizable as the transfer function of a 



i.e. , 



2 t + i = az t + re t 



cz t + V 



W(z) = I + C (zl 



a) L r, 



where 

r = ks" 1 , 
cov£ t = E-, 

M = APC + K, 

D = CPC' + E, 

P = APA' + KEk' , 



and 



h = Ey a y 'o = ca ^ _1m ' 
D = Ey o y 'o- 



We return to these topics in Chapter 10. 
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Spectral factorization naturally arises in filtering problems and in control 
problems. Consider a Markovian model 

x t + i = A *t + V 

y t = cx t + v 

where {u t , v } are jointly serially uncorrelated zero-mean processes with 
covariance 




Its (discrete) spectrum matrix is 

S (z) = R+ C (zl - A) -1 N + N' (Z _1 I - A')C’ + C(zl - A) _ 1 Q(z _1 I - A , )"" 1 C'. 

This can be factored in terms of a matrix called the return difference matrix 

T(z) = I + C (zl - A) _1 K, 

i.e. , 

S (z) = T (z) (R + CPC' ) T 1 (z" 1 ) , 

where 

K = (APC ' + N) (R + CPC') -1 

is the optimal Kalman filter gain and where P is the positive definite 
solution of the algebraic Riccati equation 

P = APA' - (APC 1 + N) (R + CPC' ) -1 (CPA 1 + N') + Q. 

This has been shown by several people. See Shaked [1979] or chapter 10 
for example. 

Its dual problem is the optimal regulator problem: minimize J where 

CO 

J = Vx[+i2x t+ i + 

Subject to 

Xt+l = Ax t + Bu t- 

Here the optimal feedback signal is 

u t = - K x t 
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with 

K = r“ 1 B'P. 

The matrix P is the solution of the same Riccati equation. Here the 
discrete return difference matrix is given by 

T(z) = I + K(zl - A) ”"*4s 

and 

T' (z" 1 ) (R + B'PB)T(z) = R + B' (z^I-A* ) ~ X Q (zl - A) _1 B = lp(z) 
is the form of the factorization. 

Optimal regulator problems and the optimal filtering problems are called 
dual because the expressions for the regulator gains and filtering gains obey 
the same equations under suitable one-to-one correspondence. See Appendix 
A. 14 for further detail. 



8.3 Computational Aspects 
Sample Covariance Matrices 



In the earlier section we listed three theoretical properties of spectral 
densities. When an expression for theoretical spectrum is approximated by 
replacing true covariances with sample covariances the approximate spectrum 
may or may not satisfy all the three properties. 

Sample covariances of {y} are commonly calculated from a finite data 



* 0 ' ***' 

( 10 ) 



*N-1 



by 



N-k-1 



\ = N y i+k y i' 



i=0 



0, ..., N-l. 



This estimate is consistent but is biased. However, this approximation 
leads to an approximate spectrum which satisfies positive-semi-definiteness. 



The next example due to van Zee I 1981] shows that the unbiased estimates 
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obtained by replacing 1/N by 1/(N - k) may lead to approximate spectrum which 
is indefinite. For this reason, of (8) is preferred. 



Example Let N = 3 and (y , y^ y^) = (1, 0, -1). With (N - k) replacing N, 
sample covariances are 

R 0 = j(1 + 1) = 2/3, 




However, this approximation to the Hankel matrix 



2/3 0 -1 



H 



3 



0 2/3 0 

-1 0 2/3 



is indefinite. (It has one negative eigenvalue.) 

Now (8) calculates the approximate covariances as 

k 0 - 2/3 ' *i = °' k = = i- 

The matrix is semi-definite 

2/3 0 -1/3 ' 

0 2/3 0 

-1/3 0 2/3 

To see that the positive semi-definiteness is preserved with this approxi- 
mation let 

R 0 R 1' *’*' V-l] 

R n : 



h. 



> o. 



H, 



'N 



■ , . . . , R 



Vr •**' ~o 

It can be written in the factored form (1/N)YY* which shows that >_ 0, 
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where 



Y 



0, ...0, y^. 



J N-1 



N x (2N-1) . 



I y 0 ' y N -i^ °' •••' 0 i 

If the infinite-dimensional matrix T is defined with T as its N x N submatrix 
in its upper left corner and zero everywhere else, and if Y is similary ex- 
tended so that T = jj- YY* , then T >_ 0. An alternative proof based on partial 
realization is given by Kimura [1982] . 




9 ESTIMATION OF SYSTEM MATRICES: INITIAL PHASE 



A Markov model of a weakly stationary time series is constructed by operat- 
ing on the Hankel matrix made up of the covariances A = Ey^y^. In deterministic 
models, the dimensions of their state space are obtained as the theoretical ranks 
of the associated Hankel matrices. In stochastic models, rows of Hankel matrices 
contain noises and the ranks must be determined numerically. Here, system theory 
has contributed a procedure for approximate model construction by calculating 
singular values of Hankel matrices, and then properly scaling variables. This 
second step is known as selecting internally balanced models based on relative 
sizes of singular values. We also comment on the close relation between the 
canonical correlation method of Akaike {1976] and the singular value decomposition 
procedure . 

We first describe how to construct full-dimensional models. Then we suggest 
a method for approximate model construction by examining relative sizes of the 
singular values of the Hankel matrices. Constructing approximate ARMA or Markov 
models of low-order this way lets the orders of the approximate models be 
suggested by data. This property seems to be quite desirable for any model 
construction method. We later say more on refining the models thus obtained 
by further optimization steps which maximize the likelihood functions adjusted 
for the number of parameters used in the models. 

9 . 1 System Matrices 



The Hankel matrix in the product form of Chapter 7 shows us a way to con- 
struct the system matrices A, B, C in a Markovian representation of time series 
{Y t ). Suppose we can construct a Markovian model of a weakly stationary process 
{Y t ) as 
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x t+l = A *t + u t' % = °' 



y t = CX t + v t' 



where {u^} and {v^} are mean- zero serially uncorrelated weakly stationary 



processes with cov 



Denote the covariances of {y^} by {A £ }. They are given by 



A £ = Ey £ y£ = CA M, % > 1, A Q = C7TC » + R 



M = At rC' + N, 



* = E XoX<r 



The weak stationarity and the dynamic equation imply that tt satisfies a 



matrix equation 



7T = AttA’ + Q. 



A truncation of the Hankel matrix made up of the covariance matrices A ' s 



can be written as 



A 1 A 2 



... A„ 



— A^ 



CA MM AM ... A MJ . 



Shift up the submatrices in ft N by one submatrix row and fill in the bottom 




62 



submatrix row accordingly to define: 





■ A 2 .. 


■ vr 




C 


H a = 


a 3 .. 


CM 

+ 


= 


CA 




■ Vn •’ 


A 2N • 




N-l 
[ CA 



A[M 



_N-1 , 

A M] . 



Take the first submatrix column of H to define 

N 







c 

C 


f A 






1 


= 


CA 


A 




S3 

1 


^ N J 




CA J 



and the first block submatrix row is named 



H = [A 

c L 1 



N-l 



A n ] = C [M, AM, A" "m]. 



The singular value decomposition theorem (see Appendix A. 12) tells us 
that matrices U and V exist such that 

U'U = I, 

V'V = I, 

and 

H = U Z V* 

N 

where the matrix Z arranges the singular values of in decreasing order in 

_ l/2 TTl i4- tt v“1/2 



magnitude on the main diagonal. Then noting that Z i/A u ’H VZ 

N 

construct 

(4) A = Z” 1//2 U'H VZ“ 1/2 . 



= I N ' We 



From the expression for H , we construct 

M 



(5) 

Similarly, from H 



l 1/2 U'H . 
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(6) C = tt c V£~ 1/2 . 

These construction steps can be related to the notion of the system matrix of 
Rosenbrock [1970] . Bosgra and van der Weiden [1980] and van Zee [1981] proposed 
the procedures followed in this section. See Bosgra and van der Weiden for 
the proof of these relations. 

Arranging the covariance matrices {A^} into the Hankel matrix we can 
estimate the system matrix A and C of the state space model (1) by (4) and 
(6). To estimate 7 r, i.e., the covariance matrix of the initial state vector 
X Q , we can use (5) if the noise covariance N is known. For example, the 
relation below (2) can be used to solve for vec 7T from vec M = (C ® A)vec tt+ 
vec N. This matrix tt must, of course, be consistent with (2) and (3), i.e., 
if Q and R of the noise covariance matrices are known, then 7T must satisfy 
TT _ ATTA + Q and A^ = C7TC' + R. Once we know A, C and M, and the covariance 
sequences {A^}, then we can estimate tt and the noise covariance matrices Q, N 
and R by 

R = A q - Cttc 1 

N = M - ATTC' 

and 



Q = TT - ATTA' 

where tt is symmetric positive semi-definite and must be such that 



Q N 

N* R 



> 0 . 



Among all such it's, we need the minimum 7 r* in the ususal partial ordering 
of symmetric positive definite matices, tt >_ because tt* is associated 
with the Kalman filter estimates as we later show in Chapter 10. See Faurre 
[1976] for example. Note also that the model (1) can be replaced by the 
innovation model we construst in Chapter 10. Nothing of substance changes. 
We resume our discussion of phase two in Chapter 10 after we first introduce 
a few more useful concepts related to -Hankel matrices. 
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9.2 Approximate Model 



Start with a full dimensional innovation model, 1,1 = AX, + Be . / Y 4 . = 

t +1 t t t 

C Xt + e^, which is derived in Chapter 10. The idea that relative , magnitudes 
of singular values of the Hankel matrices give us a way to construct approximate 
models can be quickly grasped by partitioning a state space vector into two 
subvectors 



X 



t 



X 

X 



1 

t 

2 

t 



where is assumed to be a lower dimensional approximation to a more complete 
and higher dimensional vector Partition the model conformably and write 





f A 


A. 




B., 




1 1 


12 




1 


A = 


A_ 


A^ 


, B = 


B 




k 21 


2 J 




L 2 J 



and C = (C 1 , C 2 ) 



in the state space model. Then, X^ + ^ = A i x t + B i e t' y t = C l X t + e t a 
lower-dimensional approximation to the model. 

The observability matrix 0 can be written as 



® = ®2 ] 



where 



c i A i 

C A 2 
C 1 A 1 



The matrices 0^ 2 and 0 2 contain everything not explicitly carried by 0^. 
Similarly, the controllability matrix is written in a partitioned form 



€ = 



21 



+ C 



2 
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where 



C 1 = [ V Vi' B iV J- 



The matrices C 2 conta i n terms omitted in t^. 

Because the Hankel matrix is the product of the observability and control- 
lability matrix, the true H is expressible as 



H = H 1 + AH 

where tt^ = ®i^l i s t ^ ie Hankel matrix corresponding to the approximate model, 
and AH contain all other cross-product expressions . The singular value decom- 
position theorem states that 

IIAHII = II H - H 1 !! , 



where we may use the Frobenius norm of a matrix, H xi 2 = tr X'X, i.e., H AHU = 

(£ n _ or the spectral norm, II AHU = G 1 - Arrange the singular values 

of ft in decreasing order of magnitude. If we decide to have approximation 
accuracy of G r+ ^ using the spectral norm, then retains the r largest 
singular values <j^ > ... > G^, and )(^ becomes r-dimensional. From our discussion 
on the approximate model construction, to produce an r-dimensional approximate 
model, partition £ as diag £ 2 ) where = diag (G^ , ..., o^) , U and V 

conformably; U ~ (U^, U^) , V = 

Then 




( 4 ') 

(5 1 ) 
and 
( 6 ') 



c i = «cVi 1/2 



are the system matrices associated with this r-dimensional approximate state 
space model for {y }. 
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We can motivate the proposed approximate construction in another way. 

Because H represents the input-output characteristics of a dynamic model, 

an approximation of ft produces an approximate dynamic model. Suppose that 

r is the dimension of the approximate dynamic model. We must find ft of rank 

r that best approximates ft. The singular value decomposition shows that the 

Hankel matrix can be put as 

n 



£ 

i=l 



cr.u. v! 
111 



where u ± and v ± are the eigenvectors of ftft' and ft' ft respectively both with 



eigenvalues G , i = 1, . . . , n. If we construct ft 
1 r 



£ . , G . u . v ! , then ft 
i=l ill r 



minimize "ft - 
N 



where 



is the spectral norm among all matrices K 



with rank r or less. The minimum equals G^_^. By construction such an 
approximation is unique (Kung and Lin [1981]). 

This approximating matrix ft can be written as 



where 



where 



and 



®i = V 



[u , . . . , u 

1 r 



tt = 0 1 C 1 

r 11 



1/2 



C 1 = z 



1/2 n 



r l/2 



?i /2 ' 



■ , 0 X J 2 ) , 



V 1 = [V 1 v- 



where 
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9.3 Rank Determination of Hankel Matrices: Singular Value Decomposition Theorem 

Given a finite number of data vectors from a weakly stationary time series, 
we now know that the rank of the Hankel matrix is the same as the dimension of 
a state vector of a Markovian representation of the time series. Because the 
entries in the Hankel matrix are numerically calculated from the observed data, 
they have numerical errors associated with them. Numerical determination of the 
rank of a matrix is ordinarily quite sensitive to errors. We apply the singular 
value decomposition to the matrix to determine its rank reliably. The singular 
value decomposition involves only numerically stable procedures. 

The singular value decomposition theorem tells us that any m by £ matrix 
can be written as 

A = u£v' 

where 

U'U = I 

m 

v ' v = H 

and where rank £ = rank A = r £ m, £, in which the submatrix £^ = diag (a , a , 
..., a ) is the only non-zero entries in the (mx£) matrix £. See Strang [1973]. 

The following sections describe the way we use this decomposition to con- 
struct state space models of time series. We have indicated in the previous 
section that this decomposition is also used to approximate the state model 

thus constructed by lower dimensional models, i.e., by state space models with 

2 

state vectors of lower dimensions. Because A'AV = v£ , we can interpret V to 
be the £x£ matrix made up of £ independent eigenvectors of the £x£ matrix A' A, 
and £ = diag (0^, • 0^, 0 ... 0) where CT^, i = 1, ..., r are the positive 

eigenvalues of A' A, i = 1 — r. Similarly the relation AA'U = u£ 2 shows us 
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that the m column vectors in the m m matrix U are the eigenvectors of A' A with 
2 2 2 

£ = diag (0^, 0^, 0 ... 0). We call 0 the singular value of A. Appendix 

A. 12 summarizes the other uses of this theorem. We use this decomposition in 
Section 9 to relate the Hankel matrix method to the canonical correlation 
method and to the principal component analysis of the covariance matrix. 

9.4 Internally Balanced Model 

This section examines scalings of variables that go into in the state 
space representation of time series, and constructs state space models which 
are numerically well-behaved. We mostly follow Moore [1978] , and construct 
what he calls "internally balanced" models. 

Since numerical well-behavedness is a definitely desirable property, con- 
structing (or converting an existing state space model into) an internally 
balanced model is an important step in the sequence of steps we take to repre- 
sent time indexed data. The whole process may consist of the following: (i) 

obtain a state space model of time series either by converting an ARMA model, 
somehow obtained, into a state space form as in Chapter 5 or by procedures of 
Sections 1 and 2, (ii) choose an internally balanced model representation by 
looking for a break in the ordering of singular values of the observability 
(and controllability) grammians, and (iii) partition the original state vector 
into two subvectors as suggested in step (ii) to obtain a lower-dimensional 
a PP rox i ma -te model. The Markov model for this subvector is the approximate 
model which may be converted back into ARMA representation if desired. As 
an added advantage of this procedure, it generates all approximate models of 
lower dimensions than the one actually chosen. 




Example The next example illustrates the importance of and our concern 



over scaling. Improper scaling of variables causes some pathological 
behavior in this example. The impulse response of the system described by 



v 



v t+l 




' -1/2 


0 


’ v t ' 




• io- 6 - 


^ w t+l ' 




k 0 


-1/3 . 


> w t - 




, io 6 . 



y t = (io 6 10" 6 ) 



i-1 i-i 

which is denoted by h^, is equal to (-1/2) + (-1/3) , i = 1, 2, ... and 

appears well behaved, showing no obvious anomalies. We note, however, that 
whatever change in u^_ appears enormously magnified on the w variable but it 
goes nearly unobserved. The opposite ia true with the v variable. In other 
words, this system is nearly uncontrollable and unobservable because the vectors 
multiplying u^ in the dynamics and the state vector in the y have extreme 
elements nearly cancelling each other out. This is reflected by the fact that 
the ellipsoides associated with the controllability and observability grammians 

« CO Jr 

are extremely flat. They are defined by G = Z (A 1 ) C'C(A) , and G = 

o 0 c 



^A k BB'(A') k . Here, G = G 
0 o c 



(4/3)10 



12 



6/5 



6/5, (9/8)10 



-12 



This example becomes 



better behaved by a mere rescaling of the components; for example let v^ = 10 °v, 
and = change of variables produces the model 



v t+l 




' -1/2 


0 


> <1 
r+ 


1 

+ 


' i ' 


w t + l - 




l 0 


-i/3 , 


l w t J 




. 1 . 



v 



y t - (1 15 



Now the ellipsoides associated with the observability and controllability gram- 
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mians are G = G 
o c 



‘ 4/3 


6/5 


, 6/5 


9/8 



no longer extremely flat. 



Let A be asymptotically stable. A state space model (A, B, C) which 
has the same diagonal matrix £ as its controllability and observability 
grammian, i. e. , G q = G^ = £ , is called internally balanced, i.e., (A, B, C) 
is internally balanced if the matrix equations 

CO 

G = l A k BB' (A’) k = £, £■ = l, 

° 0 

and 

G =2 (A' ) k C'CA k = l 
° 0 

hold. We also speak of an internally balanced representation when a coordinate 
system in which A, B, and C are represented leads to an internally balanced 
model. Because the controllability and observability grammians satisfy the 
matrix algebraic relations AG C A ' = G c - BB' and A'G q A = G q - C'C respectively, 
the following two equations are simultaneously satisfied by the same diagonal 
matrix £ when the system is internally balanced: 

(7) a£a' - £ = -BB' and A'£a - £ = -C'C. 



Construction 

The internally balanced representation can be constructed by following a 
two-step procedure (Moore [1978]). The idea is related to the principal component 
analysis in statistics. We return to this connection later in this section. 

Also, see Appendix A. 3. Let the controllability grammian for the system (A, B, 

C) be G^. The matrix G^ has an orthonormal eigenvector matrix and the 
diagonal eigenvalue matrix A , i.e., G T = T A or G = T A T'. 
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(i) Change the coordinate so that A = P ^AP, B = P ^B, and C = CP where we 

choose P to be T A 1/2 . Then G = P^G (p” 1 ) ' = A' 1//2 rT A T'T A” 1/2 = 1. The 
c c c cccccc 

observability grammian becomes G q = P 1 G q P . Let I* and A q be the eigenvector 

and eigenvalue matrices of G : G T = T A . 

o o o o o 

(ii) Perform another change of variables so that 

a. - 1 ^ a. ^ 

A = Q AQ, B=Q B , and C = CQ 



q = r a" 1/4 . 

O O 

Or -1^ -i ?l/2 

The controllability grammian becomes G c = Q G q (Q ) ' = A q , and the observ- 

^ ^- 1 / 4 ^ ^ ^ ^ ^ ^- 1/4 * 1/2 

ability grammian becomes G = Q'G Q = A / FT A TT A 7 = A 7 , completing 

o x o o oooooo o 

the conversion to the internally balanced representation. 

Recalling our discussion on the singular value decomposition of the Hankel 
matrix, we now show that the matrix £ there is the same as the £ we have intro- 
duced in internally balanced representation. To see this, recall that the Hankel 
matrix has the factored form H = (DC. Hence H'H = €'0'0C. The product of the 
observability grammian with the controllability grammian produces G q G c = 0'0CC'. 
Suppose G q and G^ are both positive definite. We now show that nonzero eigen- 
values of H’H are the eigenvalues of G G . Let u be an eigenvector of G G with 

o c J o c 

A as its eigenvalue, G G u = Au. Then C'G G u = H'HC'u = AC'u, i.e., C'u is an 
o c o c n 

eigenvector of (H'H) n with the same eigenvalue where ( ) denotes (n x n) sub- 
matrix. Conversely, start from (H'H) u = Au. This equation implies that (0C) u 

n n 

= A(C'0' ) n u which equals 0^ if the system is controllable and observable 

because rank 0’ = rank t' = n. Hence 0' (©€) u = A€ ,_1 u. Let u = C'v to rewrite 
ft n n n n n 

it as (O'W 1 ) v = GGv = Av, i.e., v is an eigenvector of G G with the same 
n o c s o c 

eigenvalue A of (H'H) . The above shows that if Q 2 > a 2 > ... > a 2 are the eigen- 



values of G q G c , then — 0 are the singular values of H. Because the Hankel 
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matrix H related to (A, B, C) is expressible as 0C, the matrix HH' equals 

0Ct'0' or 0G 0'. In an internally balanced representation, HH' then equals 0£0‘ 
c 

where £ is a diagonal matrix. From HH'0 = 0£0'0 = 0£G q = 0£^, where we use 00' 

= G q with the same £ as in an internally balanced representation. We conclude 

2 

then that the elements of £ are the eigenvalues of HH' or the squares of the 
singular values of H. 

By examining the controllability and observability matrix 

% % OAj -1-1 -1 -1^ 

C = [B, AB, ...] = Q P [B, AB, ...] = Q P € 



and 



2f- 



<\j 




r 


c 




c 


'VXj 






CA 


~ 


CA 


>. • 







PQ = 0PQ 



we note that the Hankel matrices are related by 
if = <$: = Opqq _1 p _1 C = H. 



This is to be expected because the Hankel matrices provide an external description 
of dynamics and is invariant with respect to basis choices to represent dynamics. 



Properties of Internally Balanced Models * 

When the two equations in (6) determining the controllability and observ- 
ability grammians of an internally balanced model are combined, the grammian £ 
satisfies an algebraic matrix equation 

A ' A£ A 1 A - £ = -(C'C + A'BB'A). 

Let v be an eigenvector of A' A with its corresponding eigenvalue A. Then the 
above equation yields, on multiplication by v' from the left, and by v from the 
right. 



This section follows Pernabo and Silverman I 1982 J . 
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(X 2 - 1) v*£v = -v* (C'C + A 1 BB 1 A) v <_ 0, 
establishing that the eigenvalue of A' A are less than or equal to one in 
modulus. A further refinement of the argument can show that |x| < 1 if the 
eigenvalues of £ are all distinct (Pernabo and Silverman [1982] ) . 

Consider a partition of an internally balanced model into two subsystems 



t+1 



A 11 *12 



A 21 *22 



X t + 



and 



IV <V X f 



The associated controllability and observability grammians both become block 
diagonal, £ = diag (£^, . Suppose that the total system is asymptotically 

stable, i.e., II A^ < 1. From the construction of an internally balanced repre- 
sentation, we know that G^ = G^ = £. Assume that £ is nonsingular. We now 
establish that every subsystem of an asymptotically stable internally balanced 
model is asymptotically stable. First, using the defining relation 

*nh*ii + A i2 Z 2 A i2 h = -B i B i 

and by multiplying it by v* and v from left and right respectively, we deduce 

( I A | 2 - Dv'^v = "■( v,A 12 S 2 A i2 V + v ' B i B i v ^ — °' 

where v is now redefined to satisfy A v = Xv, v*v = 1. Because v'£jV > 0, 
it easily follows that |A| £ 1. We can exclude the possibility that |X| =1 
because then v*A^ = 0 v'B^ = 0 must follow because £^ is positive 
definite. But this implies that 



(V, 0) 



A 11 A 12 
*21 *22 



= X (v 1 , 0) and (v* , 0) 



= 0 



hence the system is not reachable, contrary to our assumption. We conclude 
then that |X| <1 and the subsystem 1 is asymptotically stable. Since 
subsystem 1 is any subsystem, subsystem 2 is also asymptotically stable. 
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Suppose we partition the total system according to the criterion 

> 0 (£ ) and that the observability grammian Z is diagonal. For the 

max 2 



subsystem 1, satisfies 



A il 5 'l A ll + A 21 l 2 A 21 ~ X 1 



-C 1 c 

^rr 



If the subsystem 1 is not observable, there is a normalized eigenvector of 
Aff, v, v'v = 1 satisfying A^v = ^ V/ an< ^ C i v = Multiplying the above 
equation by v' and v from the left and right, respectively, it becomes 

(8) (1 - lAl^v'EjV = v ' a 2i l: 2 A 21 V - 

Note that v'Z v > 0 . (Z ) . We can also bound the right hand side by 

I — mm 1 

v'a' £'a v < Ha vll 2 cr (2). 

21 2 21 — 21 max 2 

Internally balanced models are such that HaII _< 1. This implies in particular 

II f*. 



ii 

Si- 



vll < 1 



IIa i;l vII 2 + IIa 21 vII 2 < i 



hence Ha^v® 2 <_ 1 - | A | 2 . Substituting these into (8), we obtain 

Previous results show that A <1, hence 0 . (E n ) < 0 (E_) . This contradicts 

1 1 mm 1 — max 2 

the assumed criterion for partitioning subsystems. Hence we conclude that 
subsystem 1 is observable. Proceeding analogously we also establish that 
subsystem 1 is also reachable. Kung and Lin [1981] also discuss a model 
reduction method using the singular value decomposition. 



Principal Component Analysis 

The notion of internal balanced model corresponds to that of principal 
components in statistics. Principal components are defined for a p-dimensional 
random vector x with mean 0 and covariance matrix X. Because X is symmetric 
and positive semi-definite, p normalized eigenvectors are used to define a pXp 
ortKonormal matrix T with XF = TA, where A is the diagonal matrix made up of 
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of the eigenvalues. By definition, the largest p- components of T'x are the p 
principal components of x. The first principal component is produced by Y^ x 
where Y^ is the normalized eigenvector corresponding to the largest eigenvalue 

V 

We note that the coordinate changes used to construct internally balanced 

models calculate principal components:* In going from (A, B, C) to (A, B, C) , 

.-1 



A~ 1 / 2 r , z . 

C t 



the new state vector is related to the old one by z = P z^ 

1 / 2 ^ 

The components of A z fc are exactly the principal components of z^. Similarly, 
^ ^ ^ ^ 

the step from (A, B, C) to (A, B, C) involves the change of variables z = Q z fc 
^ 1/4 ^ ^ ^ 

= ^Q Z t' ^* e *' a side from scaling, F^z^_ calculates the principal components 



0 t 



of z . 



The relation tr F'xr = tr xIT' = tr X shows that the total variance of 
the principal components is the same as that of x, i.e. , A_^, the first 

principal component explains A^ of A_^. Thus, if A 2 + ... + A^ is small compared 
with A , the first principal component explains most of the variation of x. 

This static definition can formally be extended to dynamic situations. Sup- 
pose y is produced from a white noise sequence e . £ _, by y , = Cu where 

r t t-i t+1 t 

€ = [B, AB, ...] and u^ = (e^, ^ , ...). Assume that s is mean zero and 

E(£ e 1 ) = 1 6 . Then cov (y . ) = CC' = G . Then F'y is the vector of prin- 

"C- S "w f S tl » JL C t 

cipal components where G r = TA and A is the diagonal matrix of the eigenvalues 
of G c - In an internally balanced model G c is already diagonal; hence the first 
component of y is its principal component. 



9.5 Inference about the Model Order 



How do we let data determine the order of an ARMA model? The answer 



Appendix A. 3 summarizes relevant properties of the principal components. 
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largely depends on our a priori belief in the appropriateness or correctness 
of a class of ARMA models. If we firmly believe in ARMA models as correct 
representations of dynamic mechanisms generating data, then consistency of 
estimates of the order and the system parameters should be our primary concern. 

If, on the other hand, ARMA models are used merely as convenient approximate 
representations of much more complex dynamic phenomena, then we have no compelling 
reason to emphasize consistency of estimates. We would rather strive to balance 
bias of estimates and loss of efficiency from employing too many system parameters, 
and try to achieve asymptotically efficient approximations to the true spectrum of 
the underlying process. Deistler et al„ [1982] express a similar opinion. 

From the former standpoint, a criterion 

BIC (p, q) = In CT(p, q) + (p+q) In N/N 

has been proposed by Rissanen [1976, 1983], Akaike [1973, 1976] and Schwarz 
[1978], where p+q is the total number of parameters and CJ (p , q) is the standard 
deviation of the innovation process. Adopting the latter point of view, Akaike 
proposed 

AIC (p , q) = In CT(p, q) + 2(p+q)/N 
as a criterion to choose the order the model ARMA (p , q) . 

Suppose that data Y^, i = 1, ..., N are independent random variables with 
density p(y). This AIC criterion may be regarded as an approximation that mini- 
mizes the distance between the true probability density function p(y) and its 
best approximation f(y, 0) chosen from a class of functions f(y, 0), where 0 is 
a finite dimensional vector and 0 is its estimate. Takeuchi [1983] explains 
the approximations used to derive the AIC criterion. 
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9.6 Choices of Basis Vectors 

A transfer function has infinitely many equivalent representations in state 
space from. Let denote the set of all possible state space models of dimen- 
sion n and S^/~ denote its equivalence class. A function defined on is called 
an invariant for if its value is the same for two equivalent representations, 

i.e., if x, y£S /~, then f(x) = f (y) . The Markov parameters are shown to be 
— n 

invariants in Chapter 7. 

Let us now return to the Hankel matrix (4) associated with conditional 
predictions of future y's of Section 7.1. We can assume that the p components 
of the vector Y t+1 | t are all linearly independent. The first p row vectors of 
H are then automatically included in any basis for the row space of H. Because 
the rows of H are made up of blocks of p row vectors and because of the regular 
patterns of submatrices of H, if row i is in the linear span of the preceding 
rows, then so is row (i + p) . 

A basis for the row space of tt is denoted by a set of n indeces that des- 
ignate the row vectors in the basis, i, i_ < i < < i , where n = rank H. 

— 12 n 

Number the rows of H by h , where h denotes the j-th component of y i . 

3K 3-*^ t+k t 

A basis j. can be specified by p integers (called structure indices) : The set 

{n^, n 2 , n^}, where 2hn^ = n > and n^. is the smallest integer such that 

the row k + n^p is not in the basis, k = 1 p. This construction means that 

the rows h_ _ . . . h_ , h__ . . . h. , ..., h , ... h are in the basis i. To 
11 In. 21 2n 0 pi pn — 

1 2 P 

select basis vectors, start from h^. Next check the first elements of h , h , 
etc., and stop at h^ , the first element of the n^-th block (first component 
of ^t+n -l|t-l^* T ^ en is linearly dependent on the rows previously 

chosen. This ends the first string. Next we start with h,^. 



and so forth. 
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A simple alternative selection rule of basis vectors that turns out to be sen- 
sitive to numerial noises such as round-offs, or sampling error is this: Choose 

the first n linearly independent vectors. This selection rule leads to a state 
space model in cannonical form as we later explain. 

An example may clarify the procedure. Suppose n = 3 and p = 2. The next 
table shows i_ = (1,2,4) where x denotes independent rows and 0 dependent rows. 

Here n = 1 and n = 2 because h , h , and h__ are in the basis. 



1st string 



0 (n x =l) 



2nd string 



Y t+2 | t-1 



0 (n 2 =2). 



In the cannonical selection rule, the row vectors are examined in sequence, 

h , h , h , h etc. In this model the same set of rows h, , h . h is 
11 21 12 22 11 ' 21 22 

selected. 

The next table illustrates a possible basis for i = (1, 2, 3, 4, 6), 
n = 5 and p = 3. 



1st string 



2nd string 3rd string 
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The basis vectors are selected in a member of "passes" in both tables. In the 
first table, the first string of linearly independent row vectors, which are 
selected in the first pass over the rows of H, consists of h by itself because 
h 12 is linearly dependent on h^. We earlier remarked that if h is linearly 

dependent so are all h^, j = 3, 4, The second pass yields h 21 and h 22 

because becomes linearly dependent of the previously chosen basis vectors. 

In the second table, the third string contains h^. The vector h^ is linearly 
dependent on the previously chosen vectors, hence n^ = 2. 

In the case of the first table, if (n^ =1, n 2 = 2) resulted from choosing 
the first three linearly . independent vectors, then 

h 12 = a ill h ll + “l21 h 21 



and 



But if (n 1 = 1, n 2 = 2) 
pendent vectors , then 



a 211 h ll + a 221 h 21 + a 222 h 22‘ 



resulted not by choosing the first three linearly inde- 



h 12 = “ill 11 !! + a i21 h 21 + a !22 h 22 



would result. 



The former expression for h is then the canonical representation. By 



12 

struction of the structure indices, 
(9) h. 



n , 

P 3 

„, 1 = £ £ ot. h.. , i = 1 ... p. 

V 1 j=i k=i 13k 3k 



n . 

i P 3 

Equivalently, y^ + | ^ n = E E 

n i 



t+njt-l ijk Y t+k-l| t-1' 



j=l k=l 



These np numbers {a ^ a nd another np numbers, the first p elements of the row 
vector h__. i = l ... p, j = 1 ... n^, k = l ... p, together completely specify 
the Markov parameters. The latter np parameters appear in constructing state 
dynamic equations as we now show. 
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9.7 State Space Model 



Now, define a state vector by 






lY. 



t+1 t' y t+2 t' 



Y t+n |t' *'*' Y t+l|t' * ,,, Y t+n |t ] ' 

P 



where Y^ +k | t is the j-th component of the vector Y t+k | t and corresponds to 
hj Take the conditional expectation of the representation of {y^} 

given by Y t+ ^ = j_ w ith respect to the information set at t to 

obtain 

OO 00 

y I = Z H.e = H e + Z H.e 

y t+k t . _ i t+k-i k t . , L _ i t+k-i 
1 i=k i=k+l 

or recognizing that the second term equals Y t+ jJ t ^ ' we can wr: *- te 



t+k 



t = \ £ t + Y t+kl t-r 



We can thus express the state vector X t+ ^ as 



^t+1 



z t+l t-1 



l ^t+Hpl t-1 J 



+ K£^_ where K = 



h ll( D ... h u (p) 



h. (1)... h n (p) 
ln i ln i 



h (1)... h (p) 
pn pn 

P P 



In view of (9) the first term can be written as Fx^ if the matrix F is 
introduced with the block diagonal structure 



F F . F 

11 12 lp 



\F _ F 

^ pl PP‘ 



where the diagonal submatrices are 

fo 



F. . 

11 



n.-l 



a . , , a. , ^ ... a. „ 

ill il2 lin. 



1 / 



and off-diagonal submatrices are similar except for the absence of the identity 



matrix block 
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'12 



... 0 



a ... a 

121 12n„ 



for example. 



The a' s in the last rows of are the system parameters. Because = 
Y t |t_i + the P re di ct i on Y t | t _i can be reconstructed from X t by picking 

up the first components of each of p blocks, i.e., Y t | t ^ = NX t where N = 
[N^ , ..., N ] are simply given by 





1 


0 . 


. 0 




0 


0 . 


. 0 


= 






* 


, ..., N = 


l 


* 






,0 


0 .. 


.. 0, 


P 


1 


0 .. 


. . 0, 



Example (Denham [1974] ) Let p = 2 and n = 1. Because the dimension is one 
only one row vector is linearly independent, so i^ contains a single element 
We consider three alternative choices of the index set: i = {l}, i = { 2 }, 
and i = { 3 }. 



{l}: 



i = {2>: 



i = (3): 



r t+l 



le 



12 



+ h u< 2 ^V 



y t|t-i + V 



Y t+l|t a 2 Y t|t-l + [h 21 (1) ' h 21 (2)]£ t' 



21 

1 



+ V 



y t+2 1 1 “3 y t+l|t-l + [h 12 (1> ' h 12 (2)3£ t' 



y t + i|t-i + V 



The canonical form corresponding to i = { 2 } is 



^t+l| t 



= a v 



2 jr t t-1 



+ [h 21 (1) , h 01 (2)]e 4 _, 



21 v 



y tit-i + V 



because Y t | t _ 1 being the first linearly independent vector implies that 



Y t|t-1 13 n0t ' i ‘ e -' y tlt-l = °‘ 
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Except for the nongeneric case 8 = 0 or $ 21 = 0, either form can parameterize 

all possible processes except for the canonical form. 

Another hxample, from Wertz [1981] , also illustrates the special nature of 
the canonical representation. Let p = 2, first with n = 3 and n^ = 2 and n^ 1 

so that i_^ = (1, 2, 3) and then with n^ = 1, = 2 so that i^ = (1/ 2 ' 4 ) • In 

112 

the first case X t+1 = ^ t+1 | t ' y t +2|t+l' Y t+l|t ) '' The state dynam:LC equation 
becomes 





'o 


1 


0 




r 1 1 

Y t|t-1 


i I 


x t+l 


a lll 


a H2 a i21 




1 

Y t+ll t 


+ 




^211 


a 212 a 221 J 




2 

^ Y t| t-b 


, 




fl 0 


0 






y t = 


r <D 

O 


1 


X + £ . 
A t t 





and 



In the second case, the state vector is w 



t+1 



h n ( 1 )h ii (2) 

h 12 (1)h 12 (2) 

l h 21 (1)h 21 (2) 



(Y t+l|t' Y t+l|t' Y t+2 1 t+1^ * 



The Markovian model becomes 





a ill 


a i21 


a i22 




r 1 ] 

Y t|t-i 




'h u (l) h 2i (2 )' 


w t+ i = 


0 


0 


1 




2 

Y t|t-1 


+ 


h 21 (1)h 21 (2) 




^211 


a 221‘ 


a 

222 




2 

^ Y t+l| t J 




- h 22 (1)h 22 (2) - 




r 1 0 


°1 








rt 

ii 


t 1 


oK 


+ £ . 
t 







If = (1, 2, 4) is the set of the first three linearly independent vectors, 
then 0^22 *** s zero yi e l ddn 9 the canonical form. 

The representation for i_ = (1, 2, 3) and i_ = (1, 2, 4) overlaps if the 
latter does not correspond to the canonical form because the transformation 



a ill a i21 a !22 



0 



1 



0 
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changes the representation from one local coordinate i_ to i_ in all generic 
cases, i.e., when a -^ 2 2 ^ Note ^at this is not possible for the canonical 
form because the transformation is singular if a.^ 2 ^* s zero * 



9 . 8 ARMA (Input- Output) Model 



A choice of the basis for the row space of H equivalently yields ARMA 
models. Suppose that the structure indices are n^, n^. The dependent 

rows in H are given by (1) of the previous section. It is reproduced 



here for easy reference. 

• P n j 



*t+n. t-1 



2 «4,v,y * 



>i ^ Wiki' 

Write this for all p series as 



i = 1, . . . , p. 



(+) 



A(z)y t [ t-i = 0 



where the elements of A(z) are: 



a ii( z ) = z 



a. . z 
lin. 



j-1 



iil 



a ij (z) 



‘j-1 



1Dn j 



ijl' 



i / j* 



Let r be the maximum of the structure indices, max (n^, ..., rO . 
can equivalently be written as 



Then ( 2 ) 



A O y t+r|t-l + A i y t+r-l | t-1 + *** + A r y t|t-1 



= 0. 



Noting that y 



y tk-i + V 



y t+k|t-l H i £ t+k-i y t+k 

1 k-1 

the ARMA model corresponding to (+) is obtained: 



H 0 £ t+k ~ ‘ ” H k £ t' 



A 0^t+r + A l y t+r-l + 



+ A y^ 
r t 



Vt+r + 



+ B £ 
r t 



where 



B o = A o 



B i = A i + A 0 H 1 



A + 
r 



+ A H + AH . 
1 r-1 Or 
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By construction, deg a^_.(z) < deg a_._. (z) = n_. , and 



deg & _ (z) = 



fr , if n^ = r. 



r-1, if n^ < r. 



The parameters of A(z) are exactly the same as the np parameters in 

the state transition matrix F. The matrix B(z) as calculated above contains 
row of not in the state space. model. 

To see that B(z) can be reduced, start from the state space model 
constructed in the previous section. Partition x t comformably into p 
sub vector components 



It 



2t 



l x, 



pt 



From the special structure of the matrix N in y^_ = Nz fc + £ t , y^ equals 



the first vector of z. plus £ , i.e. , 
jt t 



4 



y 3 - £ 3 . 
y t t 



The dynamics X t+1 = FX t + K£ show that 
x 5t + l = + K jl£t 



X 2 

A jt ^t+l 



- ^ 



e 3 

t+l 



jl t 






k._,£_ 
ji t 



where k . 

jl 



(h j]L (l), ..., h (p)). 



Proceeding in the same way 



\jt+l 



X jt + k j2 E t 



3 



x jt+l 


- h j2 £ t 




j 


i 




= Y t+2 - 


£ t+2 


k ji : 


2 i 






- z y t - 


z £ t 


k jr 



k j2 £ t 



k . £ , 

J2 t' 
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x n l = z J y? - z J e; - k. .£„ - .. 

A 3 t t t jn .-1 t 



n .-1 . 
3 _.i 



n.-l . 
3 



n .-2 



" k ii z 



Collecting them together, we can express X t as 



(10) X t = V(z)y t - w(z)e t 

where | V ^ ( z ) 0 



V(z) 



0 

0 

l 0 



V z) 



V p (z) 



and 



V i (z) 



i = 1, p, 



W(z) = 



W x (z) 0 



W 2 (z) 



W p (z) 



9.9 Canonical Correlation 

Earlier we have introduced Hankel matrices as a way of relating the 
predicted future realization of a time series to the realization of the current 
and past exogenous noises. Another Hankel matrix results from calculating the 
correlation of future realizations of a time series with a finite set of past 



data 
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’ y t + l ’ 




A 1 


A 2 


•• \ 


y t+2 


Y t-1 ••• y t-N+l ] = 


A 2 


A 3 ■ 


* * A n+i 


. Y t+N . 




A 


\+l ‘ 


■* A 2N-1 



where 



h = E(y ji + i y i ) - 

This idea of projecting future observations on the subspace in the Hilbert 
space spanned by the current and past noises or observations is a basis for 
many algorithms for prediction and model construction (the so-called stochastic 
realization problem) . For example, see Faurre [1976] or Akaike [1976] to which 
we shortly return. 

In practice, a sequence of sample covariances R^ = (1/N)£^_^ y t+g y^ is 

constructed from a data set iy_ ... y^ T ).* Then, for numerical reasons, the 

■*1 J N-t 

- 1/2 1/2 

Hankel matrix H is often scaled by R HR , where R = (Ri . . i) , 1 < i, 

P P P P P |i-D| ~ 

j £ p is a block Toeplitz matrix with the same submatrices arranged on block 
diagonal lines, and the singular value decomposition applied to the rescaled 
matrix. 

If V is an upper bound on the order of the ARMA model, then the (V><V) upper 
left hand corner block submatrix of the Hankel matrix can be generated as the 
correlation matrix of the future and past data vectors: Define 



and 



Y 

t 



(y 



t~i' 



y 



t-2 



t— v 



) • 



(y 



t' y t+i' 



*t+v-r 



Then 



* Some recommend dividing by N-t rather than N to preserve positive semi- 
definiteness of the covariance matrix. See Section 8.3 and van Zee [1981]. 
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% = E ^ Y + Y = E{y + Y 
t t o o 

where the second equality follows from the assumed weak stationarity of {y } . 

The dimension of the state space model can be characterized in at least 

two equivalent ways. The identification example suggests that the order of 

the ARMA model is given by the smallest positive integer n such that rank H = 

n 

rank for all i ^ 1. Alternatively, the conditional prediction example 

suggests that the order is the smallest positive integer n such that y i 

t+n I t- 1 

is linearly dependent on the predecessors 1 = 1 **•/ n-1, i.e., 

as the first row of H which is linearly dependent on the previous rows. These 
two are equivalent because H v has rank n if and only if it has only n linearly 
independent rows. This can be seen explicitly as follows: 



H = e{y + y _• } 

t t 



= E 



^ Y t+V-1 



Cy -U **• *:-J 



t~v 



z t t-1 



Y t+v-l| t-1 



[y 1 

Ly t-i 



t— v 



Y t|t-l Y 



y Y 1 

y t+v— l - 



We construct a state space model of the time series associated with this 
Hankel matrix by the singular value decomposition of this Hankel matrix. Because 
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canonical correlation also obtains a singular value decomposition of "normalized" 
correlation matrix, this procedure, therefore, reminds us a closely related method 
of constructing models of time series by canonical correlation proposed and imple- 
mented by Akaike [1976] . 

We briefly describe the basic idea. Let x and y be two column vectors of 
dimension p and q respectively, where p < q to be definite and with the covar- 



iance matrix cov 



I Z 

11 12 

E E 

21 22 



To be definite we assume that E^ and 



E 22 are nonsingular and rank Z^ 2 = r. Let the singular value decomposition of 

k e • Because the rank of ^ a l so r / the matrix P- 

is of the form P = [diag (P 1 ... p^O ... 0) , 0] : p x q. Define L and M to equal 

— 1/2 — 1/2 

U'E and V'E respectively. Let u = Lx and v = My. Then cov u = U'U = I , 
±1 zz p 

— — . — 1/2 — 1/2 

cov v = V'V = I and uv' = U'E.^ ^12^*22 v = p * T ^ ie change of variables from 
(x, y) to (u, v) causes the covariance matrix of the transformed variables to 



have a special structure: cov 



. This covariance matrix struc- 



ture shows that P = E(uv') is the correlation matrix between two vectors u and v, 
each normalized to have unit variances. The components of the vector u are 
called the canonical variables of x, those of v of y. The column vectors of L' 
and M 1 are called canonical vectors. (Note the similarities with the definition 
of the principal components. There, a given vector x is represented as F'x where 
the column vectors of T are the eigenvectors of the covariance matrix of x. Here 
two vectors are involved.) The canonical variables have unit variances. They 
have covariances (which are equal to the correlations because of unit variances 
and zero means) 



cov (u^, v ) 



p i' i = j 1 < i, j < r, 
0, otherwise. 
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^ V -1/2 V y-1/2 „ 

The positive square roots of the singular values of h \2T22 ' P l' **'' P r' 

are called the canonical correlations. Suppose that x and y are jointly normal. 
Then the conditional mean of y given x is ' x when Ex and Ey are both zero ' 

i.e., the matrix E^E.^ is recognized as the matrix of regression coefficients in 
regressing y on x. 




10 INNOVATION PROCESSES 



This chapter constructs innovation models to reproduce second order pro- 
perties of given time series. This construction phase completes the initial of 
the process of building dynamic models for vector- valued time series which was 
started in Section 9.1. 




* If z and y are jointly normally distributed, then the conditional probability 
density is explicitly calculated to verify that (2) equals E(z|y). See Aoki [1967] 
for example. 
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When a new piece of data u is added to the existing data y, the weights (1) , 
and the least squares estimate (2) are all altered in general. However, by ex- 
tracting a component from the new data that is uncorrelated with the existing 
data we can calculate E(z|y, u) quite easily. This should all be very familiar 
to the reader who knows the Gram- Schmidt orthogonal ization method in nonlinear 
programing or from our discussion of the Cholesky decomposition. The component 
of u which is uncorrelated with y is simply that part of u that is orthogonal to 
the subspace spanned by y 
(3) e = u - E (u | y) 

= u - uy ' (yy 1 ) 1 y. 



We can verify that e and y are indeed uncorrelated by calculating E(ey') = 0. 

The least squares estimate of z, given y and u is the same as the least 
squares estimate of z, given y and e. From (2) 



(4) 



E (z | y , e) = z(y'e' ) 



' — -1 


. 


yy' o 


y 


— ,-i 




\ 0 ee ' 


, e . 



zy' yy'y + ze' ee' e 



E (z | y , e) = E(z|y) +E(z|e). 



Here we take advantage of the uncorrelatedness of y and e: 



r 

Y 




yy' 


0 


, e . 


i 


l 0 


ee' 



r u | o» . 

We can formally establish that E(z|y, e) = E(z|y, u) by applying the coordinate 
transformation 



' y ' 




U 





-,-1 
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to E (z I y , u) = z(y'u') 



(y'u*) 



The required modification to the best least squares prediction is achieved by 

'u i 

merely calculating E(z|e) as the correction term to the previous best prediction. 

We call e constructed by (3) the innovation of the new data. Now introduce 
time index explicitly and number the data vector as y^, y^, ... and construct 
the innovations thus: 



e. = y . - E (y . y . , 



. , y^) , i = 2, 3, 



Then 

(5) 



where 



E(zl V Vi y i> 



^.lEtzhi) 



l n . K. IT 1 e. 

1=1 11 1 



K . = ze ! and 2 . = e . e ! . 

l l ill 

We see that the best estimate or prediction of z, given observations y , . . . , y^ 

from a time series {y } , is expressed by an MA(n) model or as a sum of n separate 

innovation terms: a finite data version of the Wold decomposition. 

By construction these e's are uncorrelated. Let D = diag (D., , ..., D ), 

1 n 

and perform the Cholesky factorization of where y* = (y^, ..., y^) , as 

2 = LDL' 

yy 

where L is a block lower triangular matrix which is the same as in the Gram- 
Schmidt orthogonalization of the y vector (see Aoki 11971; p.3j). Then we can 
write 

re ’ i 

-1 

= L y. 
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It is easy to see that {e. } are serially uncorrelated: 



[e^ ... eH = L ^Eyy'L' 1 



= l _1 E l*' 1 

yy 



iT^ldl'i/ 1 



Because L is lower triangular, the operation can be reversed to generate 

y. from e_ , ..., e. above: 
x 1 r 



y = L 



The sequence {e^} and {y^} are therefore said to be causally equivalent because 
they span the same subspace (a-field) . 

10.2 Kalman Filters 



Suppose a Markov model for the time series {y^} is known to be given by 



( 1 ) 



z t+i = Az t + u t 



y t = cz t + v t , 



where the noises are mean- zero and serially uncorrelated with covariance 
matrices 





U 




f Q 


N 




t 




*t 


t 


cov 


‘ v t - 




N' 


R 






^ t 


t J 



Start the Kalman filter for this model at time 1. It starts calculating 
the (wide sense) conditional mean of given the data {y^, •••/ y t } , t >_ 1. 




94 



Denote the data by y^ for shorthand notation. .In the previous section we have 
established that (see (1.5)) 

z t+i|t = ^ {z t+il y ^ 



'U 

= E(z 



t+l |J l 



t-1 % , 

y n ) + E(z t+1 |e t ) 



= Z t+l|t-l + Vt 1 e t 

'U I t“l t-1 

where e = y - E(y t |y^ ) because is uncorrelated with y^ by construction. 



and where 



K t = E(z t+1 e^.), and Z t = E(e t e^_). 



So far the Markov structure of (1) has not been utilized. We use the Markov 

property of the model when we relate z^ _ to z, by z^ = Az^ + u , and write 
* * x t+1 t ** t+1 t t 



t+1 t-1 



Z t+l|t-l Az t|t-1 + U t|t-1 



= Az 



t t-1' 



since E (y^ \i^_) = 0. 



(2) 



The Kalman filter for (1) can thus be written as 
z t+ iit = Az tit-i + Vt 1 e t 



where 



e t = Y t - y tit-i 



= y t - Cz tit-i 



^ i t-1 

because E(v^|y^ ) is also zero. 

The sequence {e^} has been shown to be serially uncorrelated in the previous 
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section. It is the innovation sequence associated with (1) . * 

We need various covariances associated with (1) and (2) to proceed further 
First, define the prediction error covariance of the Kalman filter by P 



(3) 



p t = E(z t - z tit-i )(z t 



= n t - z t > o. 



t t-i 



where II t and Z^_ are covariances of and z t | t _-|_ respectively; 



and 



\ = E(z t z P' 



Z =E(zi z'i ). 
t k t t-i t t-r 



The recursion (2) shows that the dynamics for Z's are 
(4) 



Vi = Az t A ' + v? k ; 



because E (z i ^ ^e^) = 0. 



The Markov model (1) yields the recurson of II 



(5) n t+1 = xn t A- + Q t . 

The filter gain becomes, after z fc+1 is substituted out by (1) and e by (2) 



( 6 ) 



\ = E « z t + i e ; ) 



E(A 2 t +u t ){c(z t -z t|t _ 1 ) + v t }\ 



AP.C 1 + N 
t t 



where we use the fact that is uncorrelated with, i.e. , orthogonal 



to, z t|t-l* s i m il a rly we can write 



(7) 



z t = E(e t e P 



CP t C' + R . 



* Unless z^, u's and v's are all Gaussian in (1), e's are, strickly speaking 
pseudo- innovations . 
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Advancing t by one in (3) and taking the difference of the two recursion 



relations (4) and (5) , we deduce the recursion formula for P * 



(8) 



p t+i n t+i z t+i 



"t* 1 + 2 t - 



The covariance matrices of {y^} are related to these matrices by 



(9) 



Y t - Ey t y; 



= ClI C' + R 

and by expressing y in terms of z g and the noises and using (1) , we calculate 



( 10 ) 



A t, s = E(y t y ;> 



= CA t-S 1 (All C ' + N ) , t > s . 
s s 

The derivations so fai£ show that Kalman filters can deal with nonstationary 
noises (and time-varying dynamics although we have not done so explicitly) . For 
our analysis of time series, however, we now assume that noises are wide-sense 



* Equation (8) can be directly obtained as follows. From (1) and (2) e t = 

c( v z tit-i ) + v Hence z t + i - z t+iit - A( v z tit-i ) + u t - K t E t le t - (A -Vt lc) ' 



r -l 



(z t -z t | t _^) + u t - v fc . Taking the covariance of this, we note that 



P t + l = CA-k^op^a-k^c)' + Qt 






Use the identity (7) = CP t C' + R t to write the quadratic' term in K as 



K E x K’ Collect the terms linear in K as -K I 1 CP A - K £~ X N 1 = -K E -i (AP C 
ttt t tttttttt t 



c-l„ 






+N 






t ) * = “ K t^t K t' w ^ ere (5) is used. p t+ ^ can written as (8) because 
P t+1 = AP t A' + s t - 2K t £;\ + K t i; x K; 



-1 



= AP t A' + Q t - K t 2 t K' . 
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stationary, dropping the time subscript from noise covariances, and assume A 
and C are constant matrices. 

Suppose now we examine the consequence of starting the Kalman filter off 
from some remote past. First, let the subscript t be replaced by t-t^ to indi- 
cate that the filter is started at time t Q . If the noise sequences are wide 
sense stationary and A is asymptotically stable, then letting t^ recede into the 
past has the same effect on the recursions for {11^} as letting t approach infinity. 
Hence, denoting the steady state (limiting) value of II by II as t ^ 00 , (5) shows 
that it satisfies an algebraic matrix equation* 

II = All A' + Q, 

and from (9) {y^} becomes stationary with the covariance matrices 

(ID Y t " Y 0 = + R' t > t Q , 

while (10) shows that 

E (y t yg) = A t _ g = ca^"~ s ""^ (aITc 1 + n) , t > s. 

In symbol was used in Chapter 9 Y^. 

Similarly (8) becomes an algebraic equation for the limiting matrix of P 

P = APA' + Q - kE^K' , 
where (6) shows that 

(12) K = APC' + N 

and the limit of the innovation covariance matrix 

(13) £ = CPC' + R, 

follows from (7) . P denotes the limiting value of p as t -* 00 or equivalently 
as t^ -*• -°°. Furthermore P = II - Z also holds, and (4) tells us that Z = AZA* 

+ KE _1 K' . 



For the existence of the limit see Lyapunov Theorem in Appendix A. 6. 
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From (11) and (13) , the matrix K is alternatively given by 



Z = y - CZC ' . 

Letting t_ -°°, (3) shows that II = Z + P ^ Z, hence TT = Z, i.e., the covariance 
0 * 

associated with the Kalman filter (2) attains the minimum among all tt's, i.e., 

among the covariance matrix of the state vector z . From (4) , tt. satisfies 

t * 

TT* = A7T*A ' + KZ ‘Sc 1 . Recalling Section 9.1, we recognize that, given A, C and 
M, 7T* may be iteratively calculated as follows (see also (4.1) below): 



= 0 , 



Si+1 = A ^n A ' + " C^ n C') ^(M - an O', 

tt. = lim 0 , . 

* n 

n ->oo 

Earlier in Section 9.1, phase one of the process of constructing dynamic 
models has been explained. In phase two, we estimate noise covariances assuming 
that A, C and M are known by 



and 



R = A q - C7T*C ' 

Q = TT* - A7T*A' 



N = M - ATT*C ' . 



10.3 Innovation Model 

The previous section is based on model (2.1) which is the same as model (1) 
of Section 9.1. We now construct the innovation model, model (I) below, for a 
time series {y t }. The innovation model is important bacuase it is causally 
invertible and can immediately be computed from the Kalman filter. Generate 
matrix sequences {T fc } , {f^} and (i^}, corresponding to tt, Z and K of the previous 
section, by 

T t+1 = AT t A ' + (AT t C ' _M t ) (Y t -CT t C ' ) ~ 1 (AT t C ■ -M ) ' , 
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and 



T q = 0, 

= Y t - CT t C' , 



L t = M t - B t C, 



where we assume that ^ is positive definite. (Otherwise use its pseudo-inverse.) 



Note that the recursion for T can be written as 
T t+ i = AT t A' + 



which is exactly the same as (4) if we identify with Z^, L^_ with and 
with Similarly/ comparison of these recursions with those of (6) , (7) and 

(9) reveals that corresponds to and to K^, because (6) shows that 
= -AZ t C' + M t where from (10) M t = Al^C' + N t# - (7) and (9) show that E = CP C' 
+ R = ClI C' + R - CZ C' = Y - CZ C'. 



We 



define the innovation model for {y^} to be 



(I) 



s t + i = as t + V^V 



W t = C? t + E t 



c o 0 



where £ is the state vector of the model, w^_ is its output vector which repro- 
duces the second order properties &£ the y's, and 

E£ = 0, E£ £' = Q 6 
t t s t t,s 

Equation (2.2) is the Kalman filter of the model (2.1). Compare (2.2) with (I) 
to note that {w fc } is a realization of {y^} and E(£ t C^) = T^_. From the zero initial 
conditions for and T , we also note that the covariance of {w} coincide with 
{A fc } of (1) : noting that L Q = M Q because of T Q = 0, 

E(W t w o) = CA^ 

= CA t_1 M„ 



A f 
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Conversely, a model with the state vector s^ and output vector y 



3 t+i = As t + L t fi t le t' 



y t = Cs t + e t 



s 0 = 0 



with as in (I) is the innovation model for {y^} if is nonsingular. 

We verify this claim by showing that the gain of the Kalman filter asso- 
ciated with (1) is exactly and that the innovation covariance is exactly 

Q . First, construct its Kalman filter: 



s t+iit = As tit-i + 



s o|-i = °' 



where the Kalman filter gain is 



r t = s t + i e ; v 



= L t n t . 



Because (1) and (2) have the same initial condition and the same dynamic equa- 



tion, we conclude that s^ = 



This can be seen in another way: From 



(1) s^ +1 = As^ + (y t -Cs t ) , s Q = 0. This shows that s is exactly computed 

from y , T < t, i.e., E(s t |y^ 1 ) = s^. Hence (1) can be written as 

s t+i|t = As t|t-i + L t n t 1 (y t -Cs t|t-i ) - 
Once s^_ = is established in (1), the innovation covariance equals 

cov Cyt-Cs^t^) = cov 

Using the steady state (limit) values of K and 2, the innovation model cor- 



responding to (2.1) with initial time in the infinitely remote past is obtained: 

f s. ... = As + k£~V 

(3) t + l t 

[ w t = Cs t + e fc 



E(e t ) = 0, E(e t e^) = I. 
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For this model ECs^s^) = Z = AZA' + k£ ^K' , the steady state version of (2.4) 
holds. The dynamics are stable if and only if the matrix A - k£ ^C, which 
results after substituting e^ out by w^-Cs^, has all its eigenvalues inside the 
unit disc. This follows if (A, k£ is stabilizable and (A, C) is observable. 
See Aoki [1967, 1976]. 



Causal Invertibility 

Eliminate E from the innovation model (I) to rewrite as 

? t + i = a? t + W'V' s o = 0 

where 

r t = L t a ; 1 - 

Because of the zero initial condition = 0, is completely determined by w^, 
. .., w t _j* These, in turn, show that is determined by w and the past w's. 

The model (3) is causally invertible. To see this, suppose that the Kalman 



filter for (3) is turned on at t = t , Then 
Z = E(s s') = E(s | _s'. .) 



t t-1 t t-1 t-t 



= Z^ ^ + P 

The limiting operation t^ -°° causes Z^ ^ 

i.e., in the mean-square sense. From (3) 

S t+1 = + KE _1 w t . 



Z, hence P,_ ^ 0 or s^i .. s. 

t-t t|t-l t 



A Markov model (2.1) with a stable A matrix, i.e., where the eigenvalues of 
A are all less than one in magnitude is called causally invertible if one can 

m.s. 

construct sequences u t (t Q ), v t (t Q ) from y fc , y fc +1 , ..., y fc/ such that u fc (t ) 

m.s. . 0 

u t and v t (t Q ) v fc as t -+ 



Define v (t ) = w - CC.i . . where £ i . is being computed from w , s = t^, 
tO t t|t-l t|t-l * r s • 0 

— , t-1. Then v (t ) w - C £ as t Conversely, suppose (2.1) is invert- 
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t-t. 



12 11 

ible. From (2.1) for any t > t^ we can write z^_ = z^ + z^ where z^ = A z^ 



_ 2 yt-1 t-1- j 

and z^ = 1 A 



Be j , where B = k£ \ Given £ > 0 choose so that E(z^z^' 



< £ I- This is possible because the magnitudes of eigenvalues of A are less than 

one by assumption. Choose t such that y , y , . .., y define an estimate 
2 0 0 

of z , with an error covariance less than £1. Let this estimate of z, be denoted 

*-• t 

by Z t* 

Then 



eH z - z II 2 = Ell z 1 + z 2 
t t t t 



< 2Efl z^H 2 + 2 eII 2 



t z « < 4n£ 



where n = dim 

Now cov (z t ) = cov (z t -z t | t _ 1 ) + Z t (t Q ). A s t Q ■> cov (z t -z t | t _ 1 ) 0 

hence ECz^z^) = II = z. Also from this the following equalities hold: 

E (y t y*) = ^ 

M = Ana’ + N , 

M = AZC 1 + K. 

From II = z follows N = K, and ECy^g) = A Q = die + R from the Markov model, 

and A q = die' + £ from the innovation model. Since II = Z, R = Z. From (2.1) 

II - AHA' = Q and from the innovation model II = Z, z - AZA' = k£ _1 K' . 



10. 4 Output Statistics Kalman Filter 

The previous sections use noise covariance information in their Kalman 
filter calculation. Son and Anderson 11973] give alternative expressions 
without noise covariances. 
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Write K as* 



and 2^ as 



K t = E(z t+1 e;) 



Ez t+l (y ; " Z t|t-l C,) 



Ez t + l y ; - “t C ' 



= Ey t y^. - CZ t C' 



= Y - CZ t C 1 . 



Substitute P^_ out by (2.3) in (2.6) to express the gain matrix as 
K t = A ( n t -Z t )C« + N t 
= an C' + N t - AZ t C* 

= M - AZ t C* 

where 

M t = An t c + N t 

and recognize that M appears in 

A . = E (y v 1 ) = CA*" S , t > s . 

t-s rs s 

The recursion for Z given by (2.4) then can be rewritten as 

(1) Z t+1 = AZ t A' + (M t -AZ t O (Y t -CZ t C') _1 (M t -AZ t C') 

where 

Y t = E(y t y;), z Q = o. 



10.5 Spectral Factorization 

Because A = E(y t y^) = CA*" ^M, the spectrum of { y is given by 
S (z) = A Q + C(zI-A)~ 1 M + M' (z“ :L I-A , )’ ;L C , 



Solo [1983] claims that E(z^ + ^y^) can be obtained in a model fitting exercise. 
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= H (z) + H* (z" 1 ) 



where 

H (z) = A q /2 + CCzI-A)”^. 

Define W(z) as the transfer function of the innovation model 
W(z) = I + C(zI-A)~ 1 Klf 1 

where, with the initial condition specified as the infinitely remote past, 
the following relations hold: 

A q = CltC' + R = E + CZC' , = CA^M, 

Z = AZA' + KE -1 K' , M = AlIC ' + N, 

II = AllA' + Q, 

K = M - AZC ' . 

Then the spectrum S (z) can be factored as* 

S (z) = W ( z ) Ew 1 (z" 1 ) . 

Note that the matrix W(z) can be constructed from 



* First, form the product 

{i + C (zI-A) ~"’ L kS'" 1 }E{i4-E” 1 k' (z”^I-A' ) } 

= £ + C (zI-A) """"Sc + K' (z^I-A')”^' 

+ C (zI-A) (z""^T-A' ) . 

Substitute - CZC' for £ and M - AZC' for K then collect terms as follows: 
W(z)£w* (z) = A q + C (zI-A) ~"^M + M' (z^I-A 1 )” 3 ^ 

+ C ( zI-A) ( z _1 I-A ' ) _ ^C 1 

where noting that k£ 1 K' equals Z - AZA' , D is identically equals zero where 
D = Z - AZA' - (zI-A) Z (z"” 1 I-A' ) 

- AZ(z _1 I-A') - (zI-A)ZA' = 0. 

This establishes the factorization. 
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Z-AZA' 


M-AZC ' 




kiT-Sc 


K 


(M-AZC* ) ' 


A -CZC' 




l K' 


Z < 



' KZ" 1 

, 1 



Z[ IT 1 *' 



i]. 



When Z equals H, the above matrix can be written as 



II- All A 




m-aHc' 


(M-AIIC 

l 


) ' 


A 0 -dIc' 


Q N 




' Kl- 1 ' 


N' R 




I 

V J 




11 TIME SERIES FROM INTERTEMPORAL OPTIMIZATION 



Economic time series are generated as economic agents engage in inter- 
temporal optimization. Although time is an extra complicating factor, dynamic 
optimization, i.e., optimization over time arises for the same reason that 
static optimization (i.e., linear and nonlinear programming) problems arise in 
economics: Trade-offs must be made in allocating scarce resources; the only 
difference being that the trade-offs over time also must be made because dynamics 
constrains choice sets effectively over time. Economic time series are usually 
nonstationary because circumstances facing optimizing economic agents change 
with time and do not remain the same. Time series are also nonlinear because 
the dynamic structure generating data are mostly nonlinear. We are thus faced 
with nonstationary and nonlinear stochastic processes. 

Intertemporal optimization of dynamic systems can best be approached using 
Markovian or state-space representation of dynamic structure. This point of 
view is inherent in dynamic programming and has been vigorously pursued in the 
systems literature. Some examples to be introduced presently illustrate how 
state-space representation may naturally arise in economic intertemporal opti- 
mization problems. 

It should come as no surprise that theory of dynamic optimization is 
best developed for linear dynamic systems. Furthermore, optimization of 
linear dynamic systems with quadratic performance indices can be developed 
in an elementary and a self-contained way without elaborate theory. Dynamic 
programming, when it leads to closed form solutions, is most effective and 
conceptually straightforward. Linear dynamic systems with quadratic separable 
cost or performance indices constitute an important class of intertemporal 
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problems which yield explicit closed form optimization rules by dynamic programm- 
ing. For this reason, we begin our discussion of dynamic optimization with 
linear dynamic systems with quadratic costs. Optimization of some nonlinear 
dynamic systems with not- necessarily quadratic performance indices may be 
iteratively approximated by solving sequence of optimization problems for 
linear dynamic systems with quadratic costs (Aoki [1962]). This further 
motivates our study of linear dynamic system optimizations with quadratic costs. 

When optimization problems with nonquadratic costs or nonlinear dynamics 
do not yield explicit analytical solutions by dynamic programming, we have no 
generally valid analytical tools for dealing with them. We must resort to pro- 
cedures to approximate nonstationary, nonlinear phenomena by locally stationary 
and locally linear ones. We can proceed in at least two ways. In one approach 
nonlinear dynamic systems are studied as deviation from some reference paths as 
we discuss in Chapter 6, i.e., decision or choice variables that are normally 
chosen to guide nonlinear dynamic systems along some reference paths are assumed 
known. (In the language of control theory, reference decision variables cause 
the nonlinear system to "track" or follow the reference time path.) We then 
focus on their deviational effects as the decision variables respond to deviations 
in exogenous variables causing the model to go off the reference paths. In this 
way, deviation of the actual time path from the reference paths are described 
by (variational) linear dynamic equations. (See Aoki [1976; pp. 59-62] or 
Aoki [1981; Chapter 2] for more detailed description of the procedure. Examples 
in macroeconomics are found in Aoki [1976; pp. 66-68, 239-243] and many places 
in Aoki [1981].) In econometrics linear (time series) models are often specified 
for variables that are logarithms of "more basic" variables, yielding so-called 
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log-linear models. These models may be interpreted to arise in the way we 
described above as deviational or variational models. These models are 
then converted to state space form to apply a body of we 11- developed theory 
for dynamic optimization in state space form. 

In the other way we do not explicitly approximate nonlinear problems, 
but rather directly work with first (and second) order necessary conditions 
for optimality. Necessary conditions for optimality rarely yield closed-form 
analytic solutions to optimization problems. Optimization problems are 
usually too complicated to permit explicit analytic solutions. Necessary 
conditions are more frequently used to characterize optimal solutions, to 
narrow a class of possible solutions over which search for optimal solutions 
are conducted. This is very well understood in the engineering literature. 

In economics, however, this seems to have been brought to the attention of 
the profession by Hall 11978] . 

Even when explicit closed form solutions are not available, first and 
second order optimality conditions are often useful in characterizing optimal 
solutions or reducing the class of solutions from which optimal ones are to 
be .chosen. Following Hall [1978] a number of recent investigators has employed 
this approach effectively. Pontryagin's maximum principle is the most systematic 
way to derive such first and second order conditions. We quote one version in 
Appendix, which is based on Canon et al. [1970] for discrete time dynamics. 

For continuous time version, see Lee and Markus [1967] , Flemming and Rischel 
[1975] or Kamien and Schwarz [1981] . 

11.1 Example: Dynamic Resource Allocation Problem* 

We use a simplified model of Long and Plosser [1983] to illustrate how 



* The model discussed in this section is a simplified version of the one in 
Long and Plosser [1983] . 
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economic time series are generated as agents engage in dynamic, i.e., intertemporal 
optimization. As Long and Plosser mention, this model so allows the maximizing 
consumer with sufficient intertemporal and intratemporal substitution opportunities 
(i.e., among consumption goods and work vs. leisure) that he chooses to spread 
effects of unanticipated output shocks through time and across commodities. Thus, 
the output time series of various commodities can show both persistent fluctuations 
and comovements. This example captures one way that business cycles may result 
from such optimizaing behavior. This example serves yet another useful purpose 
because the concept of state introduced in Chapter 2 naturally arises in formulat- 
ing the intertemporal optimization problem as a dynamic programming functional 
equation. 

Consider a dynamic allocation decision problem in which an infinitely lived 
individual allocates, his time between leisure and work and the available output 
between consumption and input for future production. First we discuss a deter- 
ministic version, then a stochastic version. The former is used to introduce 
and illustrate the dynamic programing procedure for formulating such intertemporal, 
i.e., sequential decision problems, in particular Bellman's principle of optimality. 
The latter is used to amplify on the notion of "state" of a dynamic system. 

There are two activities producing 2 goods, each of which is to be consumed 
and also be used as inputs. In its deterministic version the problem is to max- 
imize the present value at time t of the discounted sum of utilities given ,by 

co T— t 

(D u t = x T=t e u cc t , z T ), o < e < i 

where 



U(C T' V = V nZ T + V nC lT + 6 2 toC 2T 



S Q > 0, 0 i > 0, i = 1, 2, 



Subject to the next three constraints: 



c -4. + X-, < Y. . 

3t ljt 2jt - jt 



(2) 



j = 1/ 2 
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(3) 


b. 2a.. 

y. = l 1 n x . 13 

it+1 it . , ijt 

3=1 






where 






^2 

b. + £ a. . = 1, 

1 j=i ^ 


i = 1, 2 


and 






(4) 


Z £ + L lt + L 2t = H ’ 





The log- linear utility function is used to yield an analytically closed 
form solution. The leisure time is denoted by Z^. In (2) , denotes 

the amount of good j allocated as input to produce good i. The time devoted 



to producing good i is denoted by in (3). Equation (3) is the Cobb-Douglas 



production function. The parameters 0's, b's and a^ ' s express the individuals' 
preferences, and production technologies respectively and do not change. 



They are the structural parameters. 



Since H remains constant, the knowledge of = (Y^ t , Y^) at t3 - me 

completely specifies the maximum attained by U^. For this reason we call 

the state vector of the problem. The constrained maximum of is called 

the optimal value V^. Since it depends only on Y^ we write it as 

V (Y^_) = maxlu^ subject to (2) ^ (4)}. 

Note that is maximized with respect to all current and future decision 

variables. The current allocation decision variables are L. , C. , X. . , 

it it x;jt 

i, j = 1, 2. Given the current decision, the immediate or period t return 



is uCC^, Z t ) . The state is transformed into Y^ + ^ and the problem starts 

all over again, i.e., the problem of choosinq L. , C. , X. . for x > t+1 

* IT IT 1JT — 

has the same structure as the decision problem facing the individual at 



time t. Given that optimal sequence of decisions are made form t+1 on. 



the maximum value is V(Y t+ ^) . Discounting the value from future optimal 
sequence of decisions the decision at t must, therefore, maximize the dis- 
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counted sum u(C . Z) + $V (y _), i.e., 
t t t+1 

(5) V (Y ) = max{u(C , Z ) + 3v(Y )} 

t _ t t t+1 

t 

where d stands for all current decision variables. Equation (5) thus 
stands for a sequence of nested decisions 

V(Y t ) = max{u(C t , + max{u(C t+1 , Z fc+1 ) + max ^ u < c t+2 > z t+2 ) 

d t d t+l d t+2 

+ 

If a sequence of decisions {d . d. ,,, i, . } is optimal, then the 

t t+1 t+2 

subsequence of decisions covering decisions from time t+1 on {d , , d „ , } 

t+1 t+2 

must be optimal from time t+1 on. This is an illustration of Bellman's 
principle of optimality. 

Equation (5) is a functional equation that V( - ) must satisfy. In 
general this equation does not admits a closed form solution if a general 
u(* , • ) and a general production technology are employed. Our choice of 
the log- linear utility function and the Cobb- Douglas production function 
allows a closed form solution.* 

Try 



(6) V(Y t ) = Y/nY lt + T 2 to 2t + v t . 

Substituting this into the right hand side of (5) we note 

V nz t + V nC lt + e 2 fcC 2t + e V nY it + l + Y 2 to 2t + 1 + V t + 1 } - 

After (3) is substituted into i = 1, 2, maximizing the above is 

a static optimization problem solved by techniques of nonlinear programing. 

The first order conditions for optimality (these conditions are also sufficient 
for this problem) are: 
z = 

L it = 6Y i b i /A 



* Another class of problem specifications allowing for closed form solutions 
are linear dynamics and quadratic objective functions. 




112 



~it 



Vh 



x. = 3y.a. ./y . 

ljt 'l ±3 3 

where 

Y- = 9- + B^Y.a. .» j = 1# 2 

3 3 i 1 ID 

where X and y are the Lagrange multipliers associated with (4) and (2) 
respectively. (We note that the inequality (2) is always binding for our 
problem, i.e., the inequality is replaced with the equality.) 

Determine X and }-L from (2) and (4) as 
X = (0 O + ^bJ/H 

and 



= V Y jf 

Hence the optimal decisions are given by* 



(7) 



c it = ( W Y it' 
x i jt = 

L it = He W (9 0 + 

Z* = He o /(6 o + P2lY i b i ) . 



The constant term in (6) evolve according to 



v t = ev t+i + w 

where 

w = 0Q^n(0^/X) + Z0^£n(0^/Y^) + 3£^Y^{b^&n (3yb^/X) + ^ a ij^ n ^^i a ij // ^j ^ 

= 0Q^n0 Q - XH&nX + Z0_.&n0_. + 3&n3Zy i - (1 - 3)Zy i Jtny i + BZy^Ib^Jlnb^ + Za_ 



The transversal ity condition to ensure finite optimal value is 
= 0 for all t > 0. 



lim3 T v 



T->0 



t+x 



Then 



v t = w/(l - 3). 



£na. 

ID 



In (7), Y’s are derived parameters of the optimal decision rules. 
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Substituting (7) into (3) , the optimal outputs are governed by 



( 8 ) 



y t+i = Ay t + n 



where 



f Y lt] 

A = ( a ± j) r Y t = |y J Where 



&nY . . 
it it 



. and 



n. = b. £n((3y.b./A) + Ea. . £n($Y- a - ./Y-) • 
i i J i x j 13 '1 1 y 1 j 

We introduce stochastic elements by a random production or technological 
disturbance 

Y it + 1 = 

where = (X^ t+ ^,X assume< ^ to a Markovian process, i.e., the 

distribution function F (X. , _ I X. , , ...) equals F (X^ , _ I X^) . We assume 

t+1 1 t t-1 t+1' t 

that the value of X becomes known at time t+1. The notion of state must 
t+1 

now be enlarged to include A^_ because Y^_ and A^_ now completely determine 
future evolution of Y's and A's. Also we now maximaize the expected dis- 
counted streams of utilities. 

Equation (5) is replaced with 
V( 



where 



r (S t ) = max{u(C t , Z fc ) + E (V (S ) | S ) } 
d t 

s t = (Y t , A t ) . 

Equation (6) changes into 

V ( S t ) = Y 1 *’nY lt + Y 2 £ nY 2t + N(A fc ) + v t 

where 

N (A t ) = gE^Yi^it+l + N < A t + l>K } ' 

With these changes optimal decisions given by (7) remain valid. The dynamics 
for y t now are stochastic, however, given by 

(9) y t+ i = Ay t + n t+i + n 



where 
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f| . . ■ i = £nA . 
it+l it+1 

We have seen that the intertemporal optimization problem of this example 
led to the difference equation (8) which generates the sequence of y's. When 
we introduce randomness into the model, by means of stochastic yields, for example, 
then the difference equation becomes stochastic difference equation (9) which 
generates a sequence of random variables, i.e„, a time series. 

This difference equation is in the state space i.e., in a Markovian 
model form because it is a first order difference equation for the state 
vector. How does it relate to models more familiar to econometricians? 

Is it an AR, MR or ARMA model? We can answer this question easily by 
applying the Cayley-Hamilton theorem to eliminate the matrix A from the 

dynamic relations between y's and exogenous noises. (See Aoki [1976; p.45] , 

2 

for example.) This theorem states that the matrix A being 2 by 2, A can 

2 

be expressed as a linear combination of A and I, i.e., A = -0A - $1 for 
some constant a and The dynamic equation is y t+1 = Ay^ + v t where A is 

2 by 2. Advance t by one to note that y t+ ^ = A ^t+1 + V t+1 = A(Ay + v t^ + 

2 

v., , = A y + Av + v . Multiplying y .. and y by the constants a and 
L.+ X t t t+J_ t+f t 

3 respectively, and add them to y to obtain 

y t+2 + Y t+i + ® Y t = < a2 + «a + BDy t + v t+1 + (a + ai)v t 
= v t+1 + (a + oa)v t . 

This is an ARMA model involving vector processes {y^} and {v^} . The con- 
verse procedure is also possible, i.e., given any model In AR, MA, ARMA 
or ARIMA forms etc. , they can be converted to state space or to Markovian 
model forms . As we show elsewhere the state space or Markovian model re- 
presen taion and ARMA-like representation are equivalent. 

Note that the elements of the matrix A are the parameters of the pro- 
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duction function. The parameter 0^, 0^ and 0^ characterize the utility 

function. The dynamics exhibit oscillatory behavior if eigenvalues of A 

are complex, or one peak may exist for a two-dimensional dynamics even 

when the two eigenvalues both have negative real parts. 

Will this two sector model exhibit a hump- shaped multiplier profile 

said to be characteristic of the real output? The dynamic multiplier of 
k 

(8) is given by A n. Using the spectral decomposition of A, we can write 



A = I . A . u. v! , 

1111 

where A_^ is the eigenvalue corresponding to the right eigenvector u_^, and 
v| is its (row) left-hand eigenvector. 

For example, the total output multiplier with an exogenous shock to 



: f°] y 2 

Ui *= 



a A i (u ii + u i2 )v i2' k = °' x ' 



the second sector is equal to (1 1)A 
This is the multiplier time profile of exogenous shocks to the second sector 



For the matrix A, the eigenvectors are 



a 



i 



i -U'^12-" 
12 



i = 1, 2, and 



A 2 - V^ll 



(A 2 " a il )/a !2 



A l )/a i2 



1 ' 



Hence 



and 



U il + U i2 



CA ’ “ a il + a ^ )/a n' 



12 12 ' 



i(l) = - a l2 /(A 2 - A l>' 



' 2°1 



i 12 /<A 2- V- 



The multiplier profile is given by 



(A 2 _ V 1{ ~ (A 1 ‘ 3 11 + a !2 )A l + (A 



2 a il + a i2 )X 2^‘ 



A sufficient condition for the series {m t } to exhibit a peak is to 
have m 1 > 1 because m Q = 1 and rn^ = 0, or a 22 + a 12 > 1. This condition 
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may be interpreted as good 2 is productive as an input good. Although 
a il + a i2 < i = 1/ 2, the sum a^ 2 + a 22 can very well be greater than 



one. For example a^ = 0.3, a^ 2 



0.5, a 



21 



0.2 and a 22 = 0.6 yields 



*22 + *12 = 1 ’ 1 > ^ 



11.2 Quadratic Regulation Problems 

Minimization of quadratic costs subject to linear dynamic constraints is 
often called LQ problems, and is basic in many intertemporal optimization 
formulation. This class of problems is basic partly because the LQ problems 
are analytically tractable and give us insight into structure of more general 
problems, while minimization of nonquadratic costs or inclusion of nonlinear 
constraints usually lead to analytically intractable problems. This fact 
alone justifies the study of the LQ problems. Furthermore, optimization 
problems with nonquadratic criteria and/or nonlinear dynamic constraints 
can often be iteratively approximated by a sequence of problems with quadratic 
costs and linear dynamic constraints. See Aoki [1962] for example. This 
is another reason for studying this class of intertemporal optimization problems. 

This section discusses the LQ problems for continuous dynamic systems 
and discrete- time dynamics. See Canon et al. [1970] or Appendix A. 16 for 
general statements of the first order necessary conditions for optimality for 
discrete time problems (discrete maximum principle) , for example. Whittle 
[1982] has a readable treatment of the LQ problems for discrete time dynamics. 

The maximum principle for continuous time systems is discussed in a number 
of books, such as Lee and Markus [1967], Fleming and Rishel [1975] and 
Kamien and Schwarz [1981] . 
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Pis crete- time Systems 



Dynamic Programming is a powerful conceptural tool for dealing with 

sequential decision problems, i.e., intertemporal optimization problems. 

Bellman's principle of optimality produces functional equations for optimal 

value functions of dynamic optimization problems. Unfortunately the functional 

equations must be solved numerically except for a few special cases. Linear 

dynamic systems with quadratic performance indices admits an explicit solution 

to the functional equations of Dynamic Programing. 

Measuring state vectors and instrument vectors from appropriate references 

or base time paths, "regulation" or "tracking" problems are often formulated 

as follows: 

Minimize z 'P z + 

T T T=t T 

where w_ = z'0z^ + x ' Rx , 

T T T TXT 

subject to the constraint 

Z T+1 = Vt + Vt' 

Denote the optimal value of the criterion by J^_ ^(z^_) . Bellman's principle 
of optimality yields the functional equation 

(1) J t,T (z t> = { \ + J t+l,T (Z t + l )} 

where the minimization is with respect to the current choice vector x^_. 

Note the terminal condition 

m (z m ) = z'Pz m . 

T,T T T T 

This functional equation admits a solution of the form 

(2) J t,T (z) = Z ' n t,T Z T- 
Clearly, 11^ equals P. 



To eliminate clutter of subscripts, we use a useful convention used by 
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Whittle [1982] of understanding by ( ) that all subscripted variables in the 
brackets have the same subscript t, unless otherwise noted. For example. 



(A t + B C will be denoted by (A + BC ) t » 

Substituting (2) into (1) , the functional equation now becomes 
z'll z = Min[z'Qz + x'Rx + (Az + Bx) 1 II (Az + Bx) ] . 

t tjf 1 t X *C * 1 f i t 

The expression in the brackets may be written as a quadratic form (z 1 , x')II 
where 



n n 

TT I ZZ ZX 

1 n n 

XZ XX 



■Q + A* II _ A A 1 II _ B 
* t+l,T t+l,T 



B,II t + l,T A 



R + B ' n t + 1,T B 



The matrix n is symetric and non-negative definite and II = R + B’ll B 

XX u» _L § JL 

is positive definite. The minimizing value of x is determined by 



n z + n x, = o 

XZ t XX t 



x t = K t z t 



where 

k = -n _1 n . 

t XX XZ 

The minimal value becomes 



n = n - n n -1 n . 



t,T ZZ ZX XX XZ 

Relabel T as V^. Then this is a recursion for V^. Because the corresponding 
equation for V^_ is a differential equation known as a Riccati equation, the 
recursion for V is also called the Riccati equation of discrete- time LQ problems. 

Restricting the class of decisions to be linear in z the recursion can also 
be written as 



V t = MjplQ + K'RK + (A + BK) 'V (A + BK) ] . 

Bellman called it quasi-linear (because it is linear in V) . He developed a 
method of approximation called quasi- linearization based on this equation. 

Also see Aoki [1968] who applied the quasi- linearization to obtain approximate 
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solution to the Riccati equation. 

As T goes to infinity, or II ^ T approaches a constant if certain con- 
ditions are met. The limit satisfies the algebraic Riccati equation 
V + Q + A'VA - A'VB(R + B , VB)” 1 B , VA. 

The optimal decision rule becomes 

x t = K2 t 

where 

K = -(R + B 'VB) _1 B'VA. 

The same approach can handle problems in which a cross product term such as 
zx^Sz^ is present in the criterion function. 

We treat this problem by using another, and a short-cut method next. 
Consider a minimization problem with dynamic constraint 



(3) 



z t+i = Az t + Bx t 
= Cx t 



and the criterion function 
r-,N“l 



J t,N = z t ™ + x,Rx) t 



where 



Q' = Q > 0, R' = R > 0. 

Generate {v^} by 

z tVt = z t+i v t+i z t+i + y I QY t + x t Rx t- 

Substitute (3) for z t+ -^ to rewrite the above as 

(4) z t V t Z t = f z ' C, Q Cz + x ' Rx + ( Az + Bx )'V t+1 (Az + Bx)] 



0 = z^(A*V t+1 A + C'QC - V t )z t + x^_ (R + B , V t+1 B)x t 
+ z ; Av t + i Bx t + x i B,v W i“t 
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= E (x - Kz) * (R + B'V B) (x - Kz) 



+ z' (A'V A + C'QC 



K' (R + B'V B)K)z] 



where 



K t = -(R + B'V t+1 B) B'V t+1 A. 

In other words, 

[y'Qy + x'Rx]^ = [ (x - KZ) ' (R + B'V^ + ^B) (x - kz) 

+ Z ’ (A ' V t + l A + C '2 C - V - K-B'V t+1 BK)z] t + z;v t z t - 

Now related , _ to by 
t+1 t 

(5) V t = A'V t+1 A + C'QC - K; B 'V t+1 BK t 

= A ' V A + C'QC - A'V B (R + B'V , ,B) ~^B ' V A. 
t+1 t+1 t+1 t+1 

Then the criterion function is expressible in terms of V's by 

vN 



z x=t (y'Qy + x'rx) t 



2 tVt 



Wx** + < 



££< x t - Vt> ' (R + B ' v t + i B) (x t - Vt> ' 



Here J is minimized by x^ = K z L and Min „ = z*V z^ by letting V = 0 as 
t,N t t t t,N t t t N 

the terminal condition of the equation (5) . Equation (5) is known as the (dis- 
crete) Riccati equation. We note that if z'Tz„ + J „ is the cost function, 

N N t,N 

then change of the terminal condition to = T is the only modification neces- 
sary. 

The solution of a discrete- time regulator probelm with a slightly 
more general cost structure 

CO 

(6) Minimize E (z^., x^.) 



Q S'' 




’“t 






X 


S R ^ 




tj 



(7) 



z t+i = Az t + Bx t' 



can be stated in terms of the Riccati equation 
(8) P + A 'PA + Q - (S + B'PA) ' (R + B'PB)“ 1 (S + B'PA) 

where R + B'PB > 0 is assumed. 

As pointed out by Molinari [1975] , (7) can be transformed by incor- 
porating a reaction function 
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into 

(?,) z t+i = Vt + Bv t 

where 

A r = A - BK. 

Because the optimal solution is unique on the assumption that the controll- 
ability and observability conditions are met, the same P, which is the 
positive definite solution of (7) , satisfies 

p = A K P \ + Q k - < S K + B'PA k ) '(R + B'PB) _1 (S k + B'PAjJ 

where 



Q_ = Q - S'K - K'S + K'RK 
K 

S = S - RK. 

K 

The matrices Q and S are defined to keep the same cost expression. 

is. is. 



11.3 Parametric Analysis of Optimal Solutions 

Two or more distinct types of costs are often combined into a total 
cost function by assigning weights to each component of costs to produce 
a scalar-valued criterion functions for static optimization problems. 

Similarly, errors from different causes are jointed together with weights 
(such as inverses of error covariance matrices) to yield a scalar-valued 
criterion function in estimation problems. In such circumstances we want to 
know how sensitive optimal solutions are with respect to the weights in the 
criterion functions. Optimal estimation solutions often approach the least 
squares solutions as weights are taken to some limiting values. As an example, 
consider extracting an optimal secular growth time paths, {g^}, from a given 
data, {y } , by minimizing the expression 
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S t [{y t " g t )2 + M(g t “ ^t-l* ■ (g t-l " g t-2 )}2] ' 
where two heterogeneous entities, i.e. , the residuals y^_ - and the second 

difference of the growth terms are combined together with weight A to form 

an expression to be minimized. As A approaches infinity, the optimal growth 

time path approaches the least squares fit of the data, {y }, by a linear 

trend term because g will approach g Q + $t for some constatnt 3, Hodrick and 

Prescott [1981]. 

Such a parametric study is important in dynamic optimization problems as 
well. We wish to learn how the optimal solution behave as a function of the 
parameter in a criterion function, e.g., how large is the derivative (i.e., 
elasticity) of the optimal solution with respect to the parameter? Discrete- 
time problems turn out to be more cumbersome than continuous- time problems in 
answering this question. So we discuss the latter first. 

Choice of Weighting Matrices 

The spectral decomposition of dynamic matrix clearly show that the speeds 
of responses are determined by the eigenvalues while the shapes of the transient 
responses i.e., their time profiles are influenced by the eigenvectors. During 
the 1960’s system theory has recognized, and used to advantage, the fact that 
feedback of state variables can be used to alter dynamics if the matrices A 
and B satisfy a certain rank condition (known as the stabilizability condition. 
See Wonham [1967] or Aoki [1976] for discussion of this condition) . A feedback 
control rule or reaction function x = Fz modifies the dynamic matrix A into 
A + BF. For stabilizable systems, the eigenvalues of the feedback dynamic 
matrix A + BF can be assigned arbitrarily subject only to the complex conjugacy 
condition. The eigenvalues of the closed loop systems (as feedback systems are 
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often called) determine their speed of responses. The eigenvectors determine 
time profile or shapes of the transient responses. For systems with a single 
instrument, the associated engenvectors are also uniquely determined once the 
eigenvalues are assigned. Hence the speed of responses and the shapes of the 
transient responses are simultaneously determined for dynamic systems with 
single instruments. 

Parametric dependence of eigenvalues can be examined by the method of root- 
locus for dynamic systems with single instrument and single target variable*. 

See Aoki [1976; Appendix B) . The analysis using the root- locus clearly show 
that if the transfer function is n(s)/d(s) where deg d = p and deg n = q, 
q < p, then p - q of the eigenvalues go to infinity while p of the eigen- 
values approach the roots of n(s) = 0 as certain system parameter approaches 
infinity. 

Kwakernaak and Sivan [1972], Moore [1976], Harvey and Stein [1978] and 
others have generalized this result to multivariable cases and have also shown 



how the matrices Q and R in the quadratic criterion functions affect asymptotic 
properties of the optimized dynamics. In dynamic system with several instruments, 
specification of the feedback system eigenvalues i.e, the speeds of responses 
of A + BF still leaves some freedom in choosing the associated eigenvectors or 
the shapes of the transient responses. This fact was established only in the 



middle of 1970's with the appearance of Moore [1976], The next simple examples 
illustrate. Suppose A = |_° _j^Jand B = . Then with F = (-f^ -f 2 ) , the 



feedback system dynamic matrix is A + BF 



Ui 4J where h “ “ + V Y 2 = 



£ + f 2 ‘ eigenvalues A^ and A^ are uniquely determined once y^ and y^ 

are given. The eigenvectors are f* j , ^ j. Since y i and J 2 are uniquely 



* Appendix A . 8 shows how simple is the sensitivity analysis of dynamics when 
y and the decision variable are both scalar. 
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determined by and f for a given A, the feedback matrix F uniquely determines 
the eigenvectors as well as eigenvalues. Next let the system has two instruments 
with B = and consider a reaction function with the feedback matrix F = 

f f i ^ 2 ) 2 

”[f f J • The characteristic polynominal is X + Y^+ Y 2 as before where Y^ = 

$ + f + f , y = (a + f ) (1 - f ) + f (3 + f ) . The eigenvalues are 

JL ft Z w & i. fx 

and L _ where A. and A_ are the eigenvalues. They are still uniquely 

IA 2 + f x J 1 2 

determined once Y 1 and Y 2 are given. Now, however, there are many ways for 

changing eigenvectors while keeping A^ and A 2 constant because there are more 
than one way of specifying an d y by changing the elements of F. For example 
by varying f's to keep f^ + f^ and f^(l - f 2 $ + ^ 2.^4 constants, the eigenvalues 
remain the same while the eigenvectors change. 

This lack of uniqueness or freedom to choose eigenvalue-eigenvector pairs 
for multiple- output systems with the closed- loop dynamic matrix A + BF where B 
is n by m and F is m by n means that transient behavior of the closed- loop 
system can be influedneed by our choice of eigenvectors which, in turn, gives 
us a clue for choosing correct weights in the criterion function to produce 
desired transient behavior. This relation is best understood by examing dynamic 
behavior of closed- loop systems when cost associated with changing instruments 
approaches zero, i.e. , control is getting cheaper. In discussing this problem 
we also comment on the instrument instability question. For fuller discussion 
of the instrument instability see Aoki [1976? Section 5.2]. 

Asymptotic behavior is analogous to that of a single-input single-output 
system. Aoki [1976; Appendix D] has emphasized the usefulness of the method 
of root- locus to study parametric dependence of the closed- loop eigenvalues 
as a parameter varies. There, some eigenvalues of feedback systems are shown 
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to approach zeros of the transfer functions, while the remainder goes off to 
infinity following a well established asymptotes. Harvey and Stein [1978] 
established analogous results when targets and instruments are both vectors 
of the same dimension. We follow them in broad outline and examine the asymptotic 
behavior of multiple- input and multiple-output feedback systems. 

The problem is to minimize a criterion 

.00 

(y{.Qy t + px^.Rx t )dt 

for a system 

z = Az + Bx 
y = Cz 

where B is (n x r) and C is (m x n) , x = Fz is the best reaction function and 
examine the resultant system 
z = (A + BF) z 



as P + 0. 

The reaction function x = Fz converts the dynamic system z = Az + Bx into 

the closed- loop or feedback system with dynamics z = (A + BF)z. Assign 

a set of n eigenvalues of the dynamic matrix A + BF, subject to the complex 

conjugacy condition and all eigenvalues having negative real parts to make 

the matrix asymptotically stable. Moore [1976] showed that such a F exists 

if and only if (i) there exists a corresponding set of linearly independent 

eigenvectors v^ such that (A + BF)v^ = A/v\ , subject to the complex conjugacy 

condition, i.e., if A. = A., the v. = v., and (ii) the vector v. is in the 
31 31 x 



range space of where 



spans the null space of [Al - A,B] . * Such a 



matrix F is unique if rank B = m = dim y, i.e, if the dimensions of y and x 



* 



There is a vector k. such that v. 

x x 



N, k. and -Fv. = M, k. , 
A. x x A. x 

x x 



1 ^ n. 
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agree, because we can take B to be full rank without loss of generality.* 



Condition (ii) is the non- trivial condition of the two. The necessity of 



condition (ii) is easy to see. From (A + BF)v. = A.v. follows (A.i - A)v. 

v 111 1 1 

- BFv^ = 0 or ^ G null space of [A^I - A,B] . This condition is con- 

veniently expressed in terms of a matrix T(A) (called the return-difference 



matrix in the systems literature) as, T(A )ji = 0 where ]JL = F1SL , and T(A) 
= I - F (Al - A) ^B. (We also meet the return-difference matrices in iden- 



tifying closed- loop systems in Chapter 12.) Since A + BF is uniquely determined 



by its (distinct) eigenvalues and eigenvectors (think of the spectral decomposi- 



tion of A + BF) , F is unique whenever the column vectors of B are linearly 



independent. 



To establish sufficiency, suppose that a set of n linearly independent 



v_^, i = 1, . . . , n, have been chosen subject to the complex conjugacy condition 

(i) and v. is expressible as v. = N-, k, . Hence (A.I - A)N. + BM. k. = 0. 
i i A. i l i A. i 

i i 

We next show that F is determined by the conditions Fv. = -M, k. , i = i ^ n. 

* i A. i 

i 

Granting this for the moment, we then establish 0 = (A_^I ” A)v^ “ BFv_^ or 

(A + BF)v^ = A^v^, is an eigenvector of A + BF with the eigenvalue A^. 

If all n eigenvalues are real, then v_^ and -M^ k^ are all real. Hence the real 

i 

matrix F can be solved out from 



It is known that unless r ^ m, a quadratic cost 



(z'Qz + px'Rx)dt can 



J 0 

not be reduced to zero even if p I 0. In other words, if r > m, then the 
minimum of the cost as p 4 - 0 has a positive limit z q P q z q where p^ > 0. This 



result obtained by Kwakernaak and Sivan [1972] can be transcribed for discrete 
time systems . 

For this reason we examine the system where r = m with the additional 
assumption that B and C are full rank, i.e. , rank (CB) = m. This can always 
be achieved. So it constitutes no real constraint on the problem we wish 
to examine here. 
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f[v_, v ] = [w.. , w ], where w. = -M-% k. , 

1 n 1 n l A . i 

-1 1 

as F = [w^, . .., w n ^ •••/ v ] . When some A's are complex conjugate, 

we need to manipulate the expressions to involve only real numbers by the 

usual transformation. We illustrate the procedure when A^ = X., , v = v" . 

* 2 12 1 

Express v^ = a + j3. Then v^ = ot - jf3. Correspondingly w^ = Y + jS and 
w 2 = Y - j<$* We have 



[V 1' V 



[a , (3 , v^ / 






[ V w 2 r 



1 . 

2 3 
1 . 



[a, 6, w , 






and the equation to determine F now involves only real numbers F[a, |3, v^, ...] 

= [«, 5, w 3 , 

The matrix T is related to the transfer functin by 

(1) T 1 (-s) (PR)T(S) = pR + H*(-s)QH(s) 
where 

H (s) = C(sl - A)""^, and 
when F is optimally chosen, i.e. , 

F = -R -1 B'P/p 

where P is the solution of the algebraic Riccati equation 

(2) 0 = A'P + PA + C'QC - PBR _1 B'P/p.* 

From the vanishing of T(A^)ijl follows that H* (-A^)H(A^)ih = 0 or HCAjpL 
= 0 in the limit p 4- 0, becasue H(s) has no zero in the right half plane. 



* Noting that RF = -B'P, expand the right hand side as pR + B* (-si - A')"" 1 . 

pB + B 1 P (si - A) ^B + B' (-si - A , )” 1 P(BR _1 B'/p)P(sI - A) _1 B = pR + B' (-si - A*)" 1 . 

[P(sl - A) + (-SI - A ' ) P + P(BR ^B'/P)P] (si - A) ^B where the expression inside 
the square bracket equals C'QC by (2). This establishes the equality of (1). 
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Because (A_^I - A) "4^ equals v^, this condition is equivalently put as 
Cv i ~ i = 1/ n-m, 

since the null space of C is (n - m) -dimensional. The vectors v_^, i = 1 ^ n-m 
span the null space of C. These (n - m) eigenvectors correspond to (n - m) 
eigenvalues that remain finite as pi 0. The remaining m eigenvalues go 

CO . — 

off to infinity as p i 0. To capture them let A_^ = A //p. T ^ en expanding 
H (s) in a Laurent series and setting s = A^, (2) becomes 
T'(-s)RT(s) = p{R - (CB)'Q(CB) + ...} 

= 5T (CB) 'Q(CB) + ...}. 

( A ,) 2 

CO 

Thus in the limit pH, the condition for the vecotr ]JL becomes 

OO CO _2 CO 

(3) RP ± = (X i ) (CB) 'Q(CB)P if i = 1, m. 

Define the m x m nonsingular matrix by 

00 OO 

N °o = IVl V • 

Then (3) can be collectively written as 

(4) R _1 (CB) 'Q(CB)N ra = Njsf 

where 

CO OO 

S co = diag(A 1 , A m ). 

OO 

This equation clearly establishes that the vectors PL , i = 1 ^ m are the 
eigenvectors of R ^(CB) 'Q(CB) . The matrix Q and R in the criterion function 
affects not as a simple ratio R 1 Q but rather as R ^(CB) 'Q(CB) . Since 
only this ratio matters, we may take R as 

(5) R = (NoSVr 1 
and let 

(6) (CB)'Q(CB) = Oy^) -1 

because this choice preserves the key relation (4) . 
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The consideration above suggests a coordinate system to examine the contri- 
bution to the quadratic cost. Let T = [V d , BN ] , and change variables to z = Tw, 
and x = N^v. 

By construction 
0 0 

AV = v s 0 - BN 0 

where = diag(A d , . . . , A° ), where A? is the finite eigenvalue as p 4- 0, and 
0 ^ 1 n-m x 

n = [y°, y° j .* 

0 1 n-m 



Let T 

T^V 0 



. From I = T d T, we note that 
n 



I , T_ BN 
n-m 1 « 



0, 



and 



T 2 V = ° and T 2 BNoc 



Noting that 
T -1 AT = 



-nIV 



and 



12 



22 



T _1 B 



for some A.^, i = 1# 2, 



the state equation for the new state vector w becomes 

,0 



(7) 

and 



:V a° 



y = CTw = CBN^w^. 

The integrand of the cost function becomes, for our representation of 
R and Q in (5) and (6) 

-2 

(8) Y'Qy + px'Rx = w^w 2 + v'S^ v. 

Equation (8) states that only the second subvector w 2 , i.e., the sub- 
vector associated with fast modes (eigenvalues with large negative real 



* When some of A_^ are complex, the corresponding section of are (T x T) 
block diagonal submatrices. 
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values) contributes to the cost. Furthermore, it also reveals that the 
control cost, i.e, cost associated with changing instruments are weighted 



by the matrix S 



-2 



The subvector w^ represent relatively slowly decaying 



modes (compared with fast decaying modes of the subvector w^) of the feed- 
back system. Behavior of w^ is determined by the zeros of the transfer 
function which are the diagonal elements of S^. 

We can bound the behavior of w by 

s o t V s n (t_T) n 

ilWfCt)! = He w^(0) + j e A^v^TjdTlI 

where the second term is bounded from above by 



s o (t " T) 0 

He 0 a ° 2 w 2 (T)ll < 
where II is the solution of 



II H 2 w(0) 'IIw(O) 



o = h'H + iih + (° °) - E(°)s^(o, dh. 

As p i 0 this matrix II -»■ 0 if the numerator ty(s) of the transfer 
function C(sl - A) ^B, i.e., ijj(s) = | si - a||c(sI - A) ^b| , has no zeros 
in Re s > 0, i.e., if the transfer function is of minimum phase. (Kwakernaak 
and Sivan [1972].) 

The effects of w^ on subvector w^ eventually disappear and w^ is essentially 



governed by 



W 2 = A 21 W 2 (t) + V 



By examing the (2, 2) submatrix of the Riccati equation as p 4- 0, the 
dominant term of the matrix P is 
P„ 



' 22 



1 - i P 22 S " P 22 



or 



P = /e S" 1 . 

22 oo 

The optimal feedback rule is 

v = -(R 1 B , P/p)w - ( S f/P) p 22 w 2 = “ (S^/ph 
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hence asymptotically w 2 is governed by 

w 2 = A 22 W 2 ,_ (S */^2 = (a 22 " = 



w 2 (t) 



■(S //p)t 



w 2 (0) 



and 

z(t, p) = V°w ;L + BN m W 2 - . 

With discrete time systems a similar analysis is possible, or the results 
may be translated via the bilinear transformation. See Section 3.3. The 
representation corresponding to (7) is less dramatic since m eigenvalues 
go only to the circle |z| =1 rather than going off to 00 as p + 0. 

In discrete- vers ion of this section, more of the eigenvalues go to infinity 
even when the cost of control approaches zero. 

In a simpler situation of a single- input, single-output dynamics, 

X t+1 = AX t + bu t ' Y t = c x t Where c = (c Q , c^, c n _^, 0, 0) , and 

OO 2 

the criterion function IL (y ! y^ + pu_J , the return difference matrix 

0 t t K t 

T(z) = I + k* (zl - A) ^b satisfies the factorization form 

T' (z” 1 ) (p + b*Pb) T (z) = p + b' (z^I - A , )“ 1 cc' (zl - A) _1 b 
from the optimal gain u^ = k* = P and P satisfies 

P = APA' + c 1 c . 

The eigenvalues of the closed- loop system is 

| zl - A + bk* | = | zl - a| ( 1 + k* (zl - A) ■Sd) 



or 

T (z) = ( zl - A + bk 1 | / 1 zl - a|. 

Here as p approaches zero, (n - £) of the zeros of T(z) approach the zeros 
of the tansfer function c* (zl - A) ^b. The £ remaining zeros approach the 
origin. See Priel and Shaked [1983] . 




12 IDENTIFICATION 



We want to select a model from a prescribed class of models which best 
"reproduces" observed data, given the same set of exogenous input sequences. 

This is the subject of identification. 

Two notions of identif iability are found in the literature; consistency 
and uniqueness. Suppose that for a suitable parametrization of models, the 
parameter 0 uniquely specifies a model within the class. We may then speak 
of the parameter as the model. Depending on the class of candidate models, 
the "ture" model or 0^ may or may not be found in it. When it is, the con- 
vergence of the estimated parameter 0 to the true one is an issue. This is 
called consistency oriented identif iability (Wertz [1982] ) . Even when the true 
parameter value is not in the class, if each model in the class generates a 
distinct output sequences so that only one model or its parametric representa- 
tion 0 corresponds to a given input-output representation, then a "uniqueness 
oriented" identification can be examined. In other words, the examined issue 
is whether two models with different parameter values generate the same out- 
put sequences from the same input sequences, hence are indistinguishable or not. 

Here we examine the latter notion, because statistical properties of 
estimating the parameters have been discussed extensively in the literature. 
Different parameter values must produce different probability distributions 
of data for the model to be called identifiable (Solo [1983] ) . Two models 
are observationally equivalent if the probability distributions of data are 
the same (in response to the same input sequences) . If two observationally 
equivalent models are indeed the same then the model is identifiable. Taking 
the uniqueness view of identification, a function g(0) of parameter vector 
0 is identifiable if equality of two probability distributions of data vector 
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y , p(y|0 1 ) = P(y| 0 2 ) , implies that gCO^) = g(0 2 ) t i.e., observationally equiva- 
lent models assign the same value to g(0) . 

When only the first and the second moment information are used in identi- 
fication, each model is then parametrized by mean of a vector and its covariance 
matrix. Two models in Markov or state space representation are indistinguishable 
if the covariance matrices are the same and the Markov parameters are identi- 
fiable, i.e., £LC9-^) = Eh (© 2 ) i = 0, 1 , ... because the impulse responses of 
state space models are completely specified by the set of Markov parameters {ih }. 
In the ARMA representation A 2 (z) = M(z)A 1 (z) and B 2 (z) = MfzjB^z) for some uni- 
modular matrix M(z) if and only if the Markov parameters coincide. Also recall 
that the Markov parameters are invariant with respect to similarity transforma- 
tion in the state space, i.e., different choice of coordinate systems leave the 
Markov parameters invariant. 

If a time series {y^} is mean-zero Gaussian, its covariance matrices R (k) 

= E Cy. ,,y') completely specifies the probability law for the y . Hence (0_ , Q. ) 

L.TK t til 

and (^ 2 ' are ^- n ^ s ^ : ^ n g u: ^ s hablc if and only if the covariance matrices are 

the same R^(k; 0^ , Q^) = R^(k; 0^, Q^) * Even when the data are not Gaussian, 
they are indistinguishable if the above equality holds so long as we deal with 
its second-order properties. 

A function g(0) of the parameter 0 is estimable if there exists a function 

of data y, <j)(y) , such that g(0) = E^((j)(y)). Estimable functions are identifiable 

because P 0 (y) = P (y) implies that g (0 ) = g(0 ), if g(0) is estimable. In 

the ARMA models, covariances {^(0)} determine P Q (y). Hence the coefficients 

in the AR polynominal is identifiable if and only if rank H(0) = p because R (0 ) 

P 1 

= R (0 9 ) implies H(0, ) = H(0_) and r (0, ) = r (0J . 

P z 1 2 Plp2 
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12.1 Closed-Loop Systems 

Time series are often related to each other. As an example consider 
two time seires and {£ } 

\ = p(L > s t + u t' 

(1) and 

? t = H(L)n t + v t , 

where the matrices P(L) and H(L) are rational transfer matrices of L, and 

and (v^l are the exogenous disturbances. We say that the two time series 
are related by a closed- loop or feedback system because ri is dynamically re- 
lated to C which, in turn, is related to the original ri^ completing or clos- 
ing a loop. 

The time series are directly related to the exogenous sequences by solving 
the simultaneous equation 



] 




— P (L) 








u t 


-H(L) 


I 








7t; 






I 


P (L } 




u t 




= 








s 




.v 




H(L) 

< 


I 




7t, 



where 

S = diagCS^L) , S (L) ) 

S^L) = {i - P(L)H(L)} -1 

and 

S 2 (L) = {i - H(L)P(L) } _1 . 

The matrix S is known as the return difference matrix in the systems liter- 
ature and is known to be useful in stating various conditions for the closed- 
loop systems in terms of the original transfer matrices. (It also appears in 
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stability and sensitivity analysis of closed- loop systems. See Section 11.3.) 

Assume that the exogenous noises are related to stationary zero-mean white 
noise sequences with finite covariances 



' u t 


_ 


I-N (L) 0 


' a t 


1 

, where cov 


,a t 


v t 




0 N (L) 


b t 


i 


. b t 



= £ = diag (E x , T.^) . 



The joint relations between ri^, u^ and v fc are stated then by the transfer 

function G (L) : 



where 



'r) A 




c 

a 


't 


= G(L) 


t 


'tj 




S. 



G (L) = 



G n (L) - 


G X2 (L) ' 




I 


P(L)' 


s 


’^(L) 0 


g 2 i( L ), 


G 22 (L) - 




. H(L) 


I 




1. o n 2 ( l >. 



The submatrices of G are related to the original transfer matrices by 



(2) 



G U (L) = S 1 (L)N 1 (L), g 12 (L) = P(L)S 2 (L)N 2 (L) , 



G 21 (L) = H(L)S 1 (L)N JL (L) , G 22 (L) = S 2 (L)N 2 (L). 



By assumption the matrix G is rational and stable. Additionally we assume 
that there is no pole-zero cancellation so that the dimension of a minimal 
realization of a pair of matrices (P, N^) , call it n , and of (H, N 2 ) , denoted 
by n 2 , add up to n = n^ + n 2 , the dimension of a minimal realization of the 
matrix G. 

The spectrum of the closed- loop system then exists, is rational and 
given by 

S (z) = G(z)SG' (z" 1 ) . 

We say that the transfer functions P (L) , H (L) , (L) and N (L) are recoverable 

if models can be constructed with transer functions P, H, N , and N which 

1 2 
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H = H, P = P, and 1SL (z) 1 (z "S = th (z) ( z / 



satisfy the relations 

i ~ ~ _i 

i = 1, 2. 

The transfer functions for an open- loop model can be solved out from the 
matrix G of (2) : 



(3) 



P=G 12 G 22 _1 ' H=G 21 G Il 



*11 ~ 12 22 21 ' 



22 



‘ j 21°11 G 12* 



Suppose that (A, B, C) is a minimal realization of G, i.e., 

G (z) = I + C (zl - A) ~ X B 



where 



C = 



r \ 
c i 



l c - 



, and B = [B^, B 2 ] , 



G 1± = I + C^zl - A) B 1# 

G 12 = °1 (ZI - 

G 21 “ C 2 (ZI - A) '\' 



G 22 = 1 + C 2 (ZI - A) V 

Then, a Markov model for the two time series is given by 



w t+i = Aw t + Be t, 



U + 



where 



+ v 



e t = 



In (3) , we note that 



= I - C^zl - A + B^) 1 B 1 , 



and 
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22 



I - C 2 (zl - A + B 2 C 2 ) B 2 . 



FHence the transfer function P is given by 
P 



G 12 G 22 = G 1 (ZI " A)_1b 2 _ C 1 (ZI ~ A) -1 B 2 C 2 (zI - A + B 2 C 2 ) _1 B 2 



= c^zi - a) -1 [zi - a + b 2 c 2 - b 2 c 2 ] (Zi - a + b 2 c 2 ) _ 1 b 2 



- = C L (zI - A + B 2 C 2 ) B 2 , 

and the transfer function H becomes 

-1 






H = G 21 G lI = C 2 (ZI - A + W B r 



Because the matrix G(z) is minimum phase by assumption, G(z) has all poles 
in | z I ^ 1. Then, the eigenvalues of A - BC lie inside |X| <1, because G(z) 
= I + C(zl - A + BC) "Sb, and the poles of G(z) ^ are the roots of I zi - A + 



-1 



BC = 0. 



Hence 

n’ 1 = I - C 1 (B 1 C 1 + zi - A + B C ) _1 B 

has all poles inside |z| ^ 1. 

Similarly N 1 also has all poles inside |z| < 1. 

We state this as 

Fact The transfer function and N 2 are of minimum phase if G(z) is 

minimum phase. 



12.2 Identifiability of a Closed-Loop System 

Here we follow Solo [1983] in establishing the identifiability of the 
autoregressive part of the closed-loop sytem. 



- p? t = u t 



N a 
It 



Write (1) as 
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a t - N x (n t - PC t > 



(n” 1 , -n" 1 ?) 



U. 



For {a t } to be stationary, N^ 1 and N^P are necessarily stationary, i.e., it 
is necessary that is minimum phase and N^P be stable. Let 0^ be the true 
parameter value of the autoregressive part of the closed- loop system. Then a 
fortiori, N^C©^) is minimum phase and (®q) p ^ s sta ^l e - 

Let 

[P(0„), N (0 )] =a“ 1 (L)[B (L),C (L)]. 

u 1 U 0 0 0 



Irp — 2^ 

Then z C f (z ) = 0 has all roots < 1.* The criterion function L(0) = 

? 0 -1 

E(e^(0)), where e t (0) = N 1 (0) (n t - P(0)C t ) is a-idendif iable if L(0) = 

L(0 ) implies that 0 = 0 Q . Note that e t (0 Q ) = a t ' and we next establish 
2 2 

L (0) > a = E(a ), V0. To see this, substitute the system relation into 



e t (6) = n" 1 !!, -p] 



= N 1 [I, -P] 



= N 1 [I, -P] 



I -p' 


-1 


N n 0 


a 








1 




t 


-H I 


0 


0 N 0 


b 






l 2 J 


v. 


t 


o 

i — i 

5 

i— i 

m 




(PS 2 N 2 ) 


0 


C 


/ HS iV< 


y 


(S 2V0 




l 



= a t + Tl a t + T 2 b t 



where 



T, = nT'Sj? - I + n” 1 (P° - P) (HS N_ ) 



1 1 



rro 



and 



T 2 = N i 1(P ° ' P)(S 2 N 2>0- 



If a(L) a Q + a^L + ... + a^L then z P a(z 1 ) = a 0 z P + a^z^ 1 + ... + a^. 
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Note that = 0 and T 2 = (© 0 ) = 0. If S°(0) = 1 and S^CO) = 1, then T 

2 0 

is strictly causal hence L(0) > E(a t ). Thus S (0) = I is necessary for the 
identifiability of the closed-loop system. 




13 TIME SERIES FROM RATIONAL EXPECTATIONS MODELS 



Expected future values of relevant endogenous and exogenous variables must 
be incorporated in rational economic decisions. Time series are governed, then, 
by a class of difference equations which involve conditionally expected values 
of future y's as well as current and past y's: a class we have not discussed 

so far. We follow Gourieroux et al. [1979] to characterize completely the solu- 
tions of a first order difference equation for y in which y^ and a one-step ahead 
prediction term Y t+ ^ | t appear 

(D y t = ay t+1 | t + u t 

where a is a known scalar and fu^} is a mean- zero weakly stationary stochastic 
process. The symbol Y t+ j | t denotes the conditional expectation of Y t+ ^ given 
an information set 1^ where 1^ = {e^, £ t 1' Equations of the form (1) 

arise in many economic models. See Aoki and Canzoneri [1979] for the solution 
method in which terms related to Y t | t _^ rather than Y t+1 | t appear. As an example 
leading to dynamics of the type (1) , suppose that the money demand function 
(in a high inflation economy) is specified by m^ - p^ = a (P t+ ]_| t ~ P t ) and the 
money supply is m^ = where p^ is the logarithm of price level. Then p^ = 



ap t+l|t + U t where a = a/ (a - 1) and u fc = U t /(1 - a). Here P t+1 | t 



P t 1S a 



proxy for the interest rate because the expected inflation rate completely 
dominates any other effects in a high inflation economy. 

We consider three possibilites : (i) when u^ is related to a basic 

stochastic process by a MA process, (ii) by an AR process, and (iii) by 

an ARMA process. First, to obtain the solution of (1) , we need a particular 
solution of the inhomogeneous part and general solutions of the homogeneous 



part: y^ = a Y t+ -j| t * The general solutions of (1) is related to a martingale. 
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This can be seen by converting y t = a Y t+ ^| t into a = at+ "^Y t+ i| t ' and defining 
Z^ to be a t y t . Then this equation is the same as the definition of a martingale 
E(Zt + i|l t ) = z t * Hence a general solution is of the form y = a t Z fc for any 
martingale Z^. Denote a particular solution of (1) by y^, and let y^ be a 
general solution of (1) . Adding these two together, y^ + y^ satisfies (1) . 

This "superposition" principle also works with respect to the u^_ specifi- 
cation. Suppose u = 5 + where £'s and Tl* s are mutually independent mean- 

zero stochastic processes. Then a particular solution for (1) can be made up 
as a sum of two separate particular solutions of (1) , one with u = £ t , and the 
other with u fc = as disturbances. This is because y^ = aE(y^ + ^|^ t ) + £ and 
y^ = aE(y^ +1 |n t ) + n t can be added together, because = E (y^ +1 1 ^ t ,n t ) 

and E(y^ + ^|ri t ) = E ^ y t+l^ t ' ^ ky the independence of and r^, where - 
and similarly for Tl**. Hence y = + y^. 

A method of undetermined coefficients provides a basic procedure for 
solving (1) if the exogenous noises are independent. First, we illustrate it 
step-by-step. After a few practice examples, we can bypass many intermediate 
steps and proceed more directly to the solutions. 



13.1 Moving Average Processes 



Suppose now that u is MA(q) , u = C(L)£ = £.,_ + C, £ _+...+ C E 

t t ttlt-1 q t-q 

where £ is a mean zero white noise process with unit variance. We assume 
that all the roots of C(L) = 0 lie outside the unit circle. Because of 
linearity of (1) and independence of £'s, we look for a particular solution 
to the equation 



y t = ay t + l|t + £ t-i' 



( 2 ) 



i = 0, 1 



... . q. 
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Then, a particular solution satisfies (1) . Here the conditioning 

variables are = (£ , £^_^, ...) which are common to all i. 

Hypothesize a solution to (2) to be given by 

yt = a e + a.e, . + ... + a.e . 

x t 0 t 1 t-1 i t-i 

where a*s are to be determined by substituting this hypothesized solution form 
into (2) . Then advancing t by one in the above equation and projecting the 
resulting expression on the subspace spanned by E** we obtain 

Vt + + ••• + “iVi = a(a i e t + + Vt+i-i* + e t-i- 

Comparing the coefficient of £^_ ^ with that on the right hand side, j = i, 
i-1, ..., 0, we determine that 
a. = l 

i 

a. , = a 
x-1 



a = a 
0 



^t = T i (L)e t 

where 

T (L) = a 1 + a 1_1 L + ... + L 1 . 

Consequently, a particular solution of (1) is 
y t = T(L)£ t 

where 

q 

T(L) = E C.T. (L) . 
i=0 1 1 

To express y^ in terms of u , multiply both sides by C(L) 

C(L)y = T(L)C(L)£ t = T(L)u . 

By assumption, the zeros of C(L) all lie outside the unit circle so 1/C(L) is 
a well-defined causal filter. We obtain a particular solution 
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y t = {T(L)/C(L)}u t . 

This derivation does not reveal how T (L) relates to C(L) , if at all. An 

alternative procedure which we later discuss tells us that 

T(L) = C (a) + l{c(L) - C (a) }/ (L - a). 

This can be verified by substitution. 

We now switch from L to z- variable, L = z \ The MA polynomial is 

CCz 1 ) = 1 + C z ^ + ... + C z q = z q (z q + C n z q ^ + ... + C ). 

1 q 1 q 

So, in terms of the z-variable, all the finite zero of C(z lie inside the 
unit circle. 

Now, hypothesize a particular solution of the form 

y t ={°t + z N'U 1 ^ e t 

and see if o (. and Y(z exist that satisfy (1). Advance t by one in the above 
and take its conditional expectation Y t+1 | t = Y(z 1 ) e t - Substitute this into 
(1) to obtain a relation 

{a + (z -1 - a)y(z -1 ) }e = C(z _1 )e t . 

Setting z ^ to a, we see that 
a = C(a) . 

Then YC ) must be given by 

Y(z -1 ) = {C(z -1 ) - C(a)}/(z -1 - a). 

The right hand side is analytic in z For y (z e t to be a well-defined 
back-shift- invariant subspace in the Hilbert space of random variables, 

Y(z must be analytic in |z| < 1 and have zeros inside the unit circle. 

13 . 2 Autoregressive Processes 

Here = {u t , u^^ ...} = {e t , e^, • ••)• Let <()(L)u t = e where 
(p(L) = 1 + a^L + ... + a^L^with all zeros outside the unit circle. The 
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polynomial a(z then has all finite zeros inside the unit circle. Try a 
solution of the form 



(3) y fc = b(L)u t 

where 



b (L) = b + b L + ... + b L' 

0 1 p-1 

The conditional expectation then becomes 



P-1 



y t+ i|t = Vt+ilt + e(L)u t 



where 



e CL) = b + b L + ... + b ,1 P 2 . 

1 2 p-1 

The conditional expectation U t+ -L|^. I s calculated analogously as u t+ 2|t = 
-a(L)u t , because + a(L)u t = ^ t+ -^ where a(L) = a^ + a 2 L + ... + a^L P 

Hence 



(4) y t + i|t= {e(L) - b o a(L)} V 

Substituting (3) and (4) into (1) , we observe 
b(L)u t = a{$(L) - b Q a(L)}u t + u fc 
If the polynominal b(*) is chosen to satisfy 
b Q + 3 (L) (L - a) + ab Q a(L) -1 = 0 
identically in L, then (3) is a particular solution. 

Setting L to a in the above, the constant b Q is equal to 
b Q = l/{l + act (a)} = l/<j> (a) 

if <f>(a) is not zero. Assuming this for now, the polynomial (3(L) is determined 



then 



3(L) = {l - <j)(L)/<f>(a)}/(L - a) , 



Hence 



b (L) = b Q + L (L) 

= {l/<j> (a) }{l - L . 

JL — a 
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We can rewrite (3) as 

<f>(L)y t = b(L)e t 
hence y is an ARMA (p , p-1) . 

When <j>( a ) is zero, the trial solution 

y t = {lB(d + t(Y 0 + Lr Q /L)}u t 

where deg F^ = p - 2 works. 

If p = 1, then b Q = 0 and r (•) is zero. If 1 is a root of 4> ( ) of 
multiplicity d then y is an ARIMA (p-d, d, p-1). See Monfort et al. [1982]. 



13.3 ARMA Models 



Consider 



y t = aY t+iit + u t 



where 



(J)(L)u t = C(L)e t 

where the root of <j) and C all lie outside the unit circle. 

Multiply the model by <j> (L) to render it as 

(5) 4>(L)y t = a4i(L)y t+1 | t + <J>(L)u t 

= a^(L)y t+1 | t + C(L)E t . 

Introduce an auxiliary variable ri by 
\ = < f>(L)y t . 

Then (5) is a first order difference equation for 
\ = a \+i|t + c(L)e t 

which is the MA form discussed above. Its particular solution has been 
derived as n t = T(L)e where T (L) = C(a) + l{c(L) - C(a)}/(L - a). Hence 

(6) y t = {t (L)/<f (L) }e t = {T(L)/C(L)}{C(L)/$(L)}e t 

= {T(L)/C(L)}u t . 
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Thus a form C(L)y t = T(L)u t is suggested as a possible solution, where 
deg T = max(q, p-1) . 

We need u , . i in calculating . i . Write u , . as 
t+l|t t+l|t t+1 

u t+l c (L) /cf> (D > e t+1 

= (1 + C(L)/<()(L) - X)e t+1 

= e t+1 + -Cc CL) /<#> (L) - l}e t+1 - 



Hence 



_ l. C(L) 

t+l|t L $ (L) 1)£ t 



I ( £iJiL _ 1} .iM u 

L V (f) (L) 1 C(L) t 



1,, ^ (L) 

L 1 C(L) )U t' 



Then advancing t by one in (6) , and adding and subtracting an undetermined 



constant T^, we express 



y t+1 = {T(L)/C(L)}u t+1 



+ bW* - x o )L}u t + i 



T o u t + i + - V V 



hence 



Y t+l|t T 0 U t+l|t + L ( C(L) T 0 )u t* 

Substituting this into the original equation 

aT 



t(d/c(l) u t = + f(^ ) 



C(L)'“t ' L'C(L) T 0 )U t + U t 



(7) 



T(L)/C(L) 



-^(i- «KL)/C(L)) + £(§g- 



T 0 ) + 1. 



Letting L = a, must satisfy 



T (a) /C (a) = t q (1 - <)>(a)/C(a)) + (T/a)/C(a) - t q ) + 1 
or if <j)(a) ^ 0, then = C(a)/$(a). 
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Substituting this back into (7) determines 

T(L) = inb {LC(L) - 

As pointed out by Monfort et al. this method is superior to the method of 
Blanchard [1979] which works only in subcases, as the next example shows. In 
the example Blanchard's method works only if | acp^ | < 1. 



13.4 Examples 



Example 1 Suppose 

u t + Vt-1 = V 

Then Blanchard's method solves this equation by successively advancing t 
J t+1 



^1 t 



t+1 



Hence u t+i|t = _cp i u t- similarl y u t+2 = i>iVi + e t +2 - Hence u t+ 2 |t 
-cp i u t+1 | t = (-cp i ) 2 u t and u t+i | t = in general i > 0. Hence 



y = 2 (-acp ) u 

1=0 

converges if and only if | acp.J < 1. The Monfort procedure shows that 



y^ = u t /(l + cp^a) as a Particular solution always unless 1 + cp^a ^ 



0. 



Example 2 Consider a simple model of a closed economy given by 
(8) 

(9) 



y t = -a{i t 



(p t + llt - p t )} + 



y t = a(p t 



’tlt- 1 1 + s t' 



( 10 ) 



y,. 



-ki t + P t + y t . 



Equation (8) is the aggregate demand equation. The aggregate supply function 
is given by (9) . Behind (9) is a wage contracting story. The demand for. real 
balances is given by (10) , where the price index term drop out from both sides 
by assuming unit income elasticity of demand for real balances. 
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Solve (10) for i 



-u t + p t + y t 



Equating (8) with (9) we obtain the dynamic equation for the price time series: 

(U) Vt - Vt|t-1 - P t + I|t = (1+0/k)s t - \ - (0/k)y t 



TT^ = a ( 1+0/k) + a(l+lA), IT 2 = a(l+cr/k). 

Suppose the noises on the right hand side is specified by E?_^f (L) 

Postulate that the solution is of the form p = £ . (y ■ + Lh.(L))e\ Then p i 

t 31 3 *“■ H t x 

= S ^Lh_. (L) £^. Because p t+1 = (y^ + Lh.(L))£^. +1 , we note that P t+1 | t = 

Z_.hj(L)£^. Substituting the postulated solution form into (11), we find that 

Yj and hj (z) must satisfy (See Futia [1979, 1981]) the following relation 

F TT . + (TT z - Hz - O)h.(z) = f.(z). 

131 ^ 3 3 

Let A be the root of (F^ - ir^Jz - O' = 0. Then we solve for y^ by 

y i - Vi 111 



A = 0/(TT 1 - 7 T 2 ) = (1+1/k)" , 

and hj is given by 

h (z) = (f (z) - n r )/(ir z - IT z - 0 ) = {f (z) - f CA) }/[ (tt - tt ) z - a]. 

3 3 J-jxz 3 3 -Lz 

Note, however, that h^ will not be analytic inside the unit disc |z| < 1, unless 

f^z) - f.(A) = (z - A)V.(z), g (A) ? 0, & > 1 



(z) = Cj + (z - A) g^ (z) 

for some c. ^0. 

3 

Some particular cases obtain by specializing fj(*): Let f_^(z) - c ^ . Then 

Y. = c./Jf and h, = 0; If f.(z) = c. + (z - A)d., then y. = c./tt_ and h. = d.. 
3 3 1 3 3 3 3 '3 3 1 3 3 
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Example 3 A simple N-sector stochastic employment model 

The next example highlights the role of (disparate) information in generat- 
ing serial correlations even when exogenous disturbances are serially uncorrelated. 
We do not wish to imply, however, that this is the only or the most important 
source for business cycles. Nevertheless this example is interesting because 
it does illustrate one often overlooked source of serial correlations. 

Consider an economy with N sectors. The i-th sector employment level is 
related to the sector output by 
(12) 



L t' 



and changes with time according to 



(13) L t+i = (1 ‘ 6)L t + 0 E t (y t+i - Yt+i + g 



t+1 



where the average of y’s is defined by 

a 1 V N j 

Y t+1 “ N E j=l Y t+l' 

and g^ is an exogenous disturbance to be specified presently. The symbol E^ 
denotes the conditional expectation E ( • | 1^) where 1^ is the information set of 
sector i. Equation (13) describes processes of labor movements. Labor moves 
to a sector with higher than average prospect of employment which, according 
to (12) is equivalent to the sector with higher than (national) average output.* 
Substituting (1) into (2) and defining L^ +1 as Z^ =1 L^ +1 /N, we rewrite (13) as 



(14) 



= (1 - 6)L* + qQeH* 



+ g^. 



t+i ""“t ' N '''~t~t+i ' y t+i 

where we denote the deviation of the i-th sector employment from the average 

by 4 +1 , i.e., 

Averaging this equation over N sectors , we obtain the dynamics for the 
aggregate or macro-system: 



* Labor and output may be interpreted in per capital stock. Then variables 
measured from some trend (growth) path will be related by (12) and (13) . 
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(15) L a t+] _ - (1 - 6 )L* + SI E.eJ J +i + g - +i 

and where is the average disturbance defined to be E.g^/N. 

t j t 

Taking the difference of (14) and (15) we note that 

(16) 



o 1 

t +1 



(1 - 6) ^ + a0(E^ +1 - ^ jE ^ +1 ) + g^ +1 



t+ 1 ’ 



The total system dynamics are described by the average behavior (15) and 
deviations from the average (16) . 

When we sum (16) over i we obtain Eit 1 = (1 - 6 )E£'J'. Hence if E.£;J = 0 , 

t +1 t j 0 

then = 0 for all t > 0 . Even if E.£;} ^ 0 , the magnitude I E£^ I mono- 

3 t +1 — 3 0 1 t 1 

tonically converges to 0 as t goes to infinity. We assume that E£^ = 0 for 
all t >_ 0. We follow Futia [1979] and define a micro equilibrium to be a 
collection of covariance stationary stochastic processes {£^} such that 
E^&^ = ^ for a H i = 1, . .., N, or using to denote orthogonal projection 
onto I*, tt^£^ = £^. Taking the expectation of (16) yields 



where we use (tt^)^ 



+ ^t ( 4i 

= or 



a \ 
g t+l } ' 



(1 - ae ) - 1 {(i - 6)** - ^<Yt*t + l> + < ( 4l *t + l 



t t +1 

Summing over i we note that 






< 17) = - (1 - ae)_1 2 I« Vt> v/A+i' + (i - - Cx> 

where we used E.i 1 = o. 

3 

Now we consider two cases in turn: A common information pattern and a dif- 

ferential information pattern. 



Case of Common Information Patten Assume that = 1^ hence the orthogonal 
projection operator 7f^ is the same as 7T t for all i. In this case (17) reduces 
to a trivial relation "0 = 0". Equations (15) and (16) become 
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(18) 


L t +1 ; 


and 




or since 


Vt + i 


(19) 


i 1 . = 

t+1 


where 





a - «)l; + g ; +r 



4i - (l - 6) 4 + ae \4i + 4i - 4i' 

-i 



(1 - 0Q )~ 1 { (1 - 6 ) J ^ + 7 T t ( g ^ +1 



g t+ ^) } , the dynamics become* 



? t + l " W 



, q 9 ^ , i 

i-q0 11 1 g t+i 



- 



K = (1 - 6)/(l - 00) . 

We assume that K < 1, i.e., that 6 > 00. Suppose now that the exogenous 

disturbances on individual sectors are specified by 
i 

9*. 



’t + i - {tp o + Lf o (L)l£ t + x + {(P i + Lf i (L)}e t + l 



0 



where E^, £ , i=l, . N are the primitive independent random variables with 
zero means and unit variances. 



Thus g^ +1 - = {cp. + Lf.(L)} e ^ +1 - ^.{cp. + Lf.(L)}e^ r 



The 



information set is the minimum closed subspace spanned by £*^, . 



and their past values. 
Hence 

a 



v g ;+i - g ; + i> = + Lf i^>>< + i - + l v l >K + i 

= f i (L >4 - sVj (L) 4- 



Equation (19) thus becomes 

n 1 



(i - L )^ +1 = {CP. + Lf.(L )}^ +1 - ^.{cp. + Lf.(L )} £ ^ 1 

00 






Postulate that 
oi 



( 20 ) 



^ = ^j{Yj + Lh^(L)}e^. 



Then (20) satisfies (19) when we choose y's and h's by 
Yj = <P ± (N - D/N, 



This type of dynamic equation is not considered by Gourieroux et al. [1979]. 
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(21) C L) = I^oe f i (L) + N ' 1 



y 1 = - im h 1 (L) 
J N^j j 



N(l-KL) 7 

N(T-Kz) { ibe £ j (L) + Kcp j } ' ’ * 1 



where we note that 1/K is greater than one due to our assumption. Equation 

(21) shows that h^(z) is the same for all i. Hence we write h. for h^. 

3 3 3 

The same for y's. Note that the solution (20) exhibits serially correlated 
disturbances (business cycles) , if and only if the exogenous disturbances 
are serially correlated. 

From (18) , the aggregate dynamics become 



(18') 



t+1 



" (1 - 6)L t + {ip 0 + Lf 0 (L)}e t+i + + L V L)}e t + r 



Suppose that f^ and f ^ are all zero. Then the covariance sequence shows 
no serial correlation. 



Case of Differential Information Set Suppose now that is the orthogonal 
projection onto the subspace spanned by and their lagged values. Let 

the exogenous disturbances be the same as in the previous case. 

Suppose 

Cl = V Y j + Lh j (L))e t+r 



Then 



= C L) v 



Hence 



viCi = h b i< L ><- 



Then 



Tr i (E j 7 T tC + x ) = C l) C 



Equation (16) becomes 



{1 - (1 - 6)lH+ = a0hi (L) ej; - ^.h^L)^ 

t+1 l t N 3 3 t 



+ {cp i + Lf i (L)}E t + i - &j {< Pj + 
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( 22 ) 



and 



Y . = -cp ./N 
3 3 

h j ( z) = - |{(1 - 6)cp j + f ± (z)}/{l - (1 - 6)z + ae/N}, 



N - 1 

N ' 



ht (z) 



-^{(1 - 6)cp ± + f i (z)}/{l - (1 - 6)z + a0/N}. 



The root z = (1 + a0/N)/(l - 6) lies outside the unit disc, hence lu(z "S i s 
analytic inside |z| < 1, as required for the construction to be valid. 

Because K < (1 - S)/(l + a0/N) , comparison of (21) with (22) reveals 
that serial correlations in die out more slowly with the common information 
set in this simple example. 

The aggregate dynamic equation (15) becomes 
(15') L 



(1 - S)L - + ^. h .( L )^ + {(Pq + Lf 0 ( L )}^ +1 



t+1 



+ i z j {ip j + Lf j (L)}e t + r 

Compare (15') with (18'). We note that the aggregate dynamics under differential 
information pattern, are more complex since the effects of is not simply 

ifj(L)e^ as in (18') but are given by 






(i - S)L 



(l-S)L 



ae 1 - <5 

N 



N 



As N becomes very large, the difference approaches zero, however. 

Even when f and f^ are all zero, disturbances in (15') are now serially 



correlated. This is the most significant consequence of the differential 



information pattern. 




14 NUMERICAL EXAMPLES 



Two vector- valued time series from monthly observations on the Japanese 
economy have been used to estimate innovation models. The vector y^ is five- 
dimensional in one case and six-dimensional in the other, both covering the 
period of January 1975 to January 1982. All together there are 85 monthly 
observations per component. The five components are M^ + CD (money supply 
outstanding, and quasi money plus certificates of deposit) in G.l Billion ¥; 
call rate (in Tokyo, unconditional, average; free rate after April 1979) ; Ex- 
change rate, ¥/$, (customs clearance- conversion rate, exports); production 
index-miniming and manufacturing (seasonally adjusted) , and the wholesale price 
index-all commodities. The base year for these two indices is 1975. The sixth 
component for the second time series records the current account in million 
dollars. The five data series are plotted in Figures 1 't 5. They are tabulated 
in Table 1. The three series for M^ + CD, WPI and Index of production are further 
processed by taking the first difference of their respective logarithms . They 
are shown in Figures 6^8, where L stand for first difference of the logarithms. 
We note seemingly random scatters rather than trend growths that are visible in 
the original data. They are tabulated in Table 2.* The data are further trans- 
formed by subtracting sample means and dividing by the sample standard deviation 
to produce the two time series {y } which are both mean zero, and of full rank. 

The first 70 data points were used to fit an AR model to the five-dimensional 
series by an AIC program supplied by Dr. H. Akaike. The program produced AR(2) 

Y t = B(1)y t-1 + B(2)Y t-2 + X t 

where the 5x5 matrice B(i) , i = 1, 2 and the covariance of x are printed 
out in Table 3. 



Because of differencing, the maximum usable data points are 84. 




155 





ICO 


EX 


Table 1 

CALL 


M 


WP1 


CUA 


1 


98.8000 


300.890 


12.6740 


. 106833E+07 


100.400 


-1164.00 


2 


97.9000 


297.100 


13.0000 


.107348E+07 


99.7000 


109.000 


3 


96.6000 


287.920 


12.9200 


. 109375E+07 


99.3000 


130.000 


4 


98.9000 


290.570 


12.0200 


. 1 1 1Q28E+07 


99.5000 


185.000 


5 


98.9000 


291.940 


11.0600 


.111588E+07 


99.5000 


-574.000 


6 


99.9000 


291.840 


10.7200 


.113824E+07 


99.3000 


89.0000 


7 


101.000 


295.610 


11.0000 


.114483E+07 


99.4000 


-4.00000 


8 


100.800 


297.310 


10.6920 


. 115610E+07 


99„ 9000 


22.0000 


9 


101.700 


298.100 


9.66700 


. 1 164 59E + 07 


100.200 


-41.0000 


10 


102.100 


302.380 


8.73100 


. 117207E+07 


100.700 


-156.000 


11 


100.400 


302.040 


7.60900 


-120570E+07 


100.700 


37.0000 


12 


102.600 


304.750 


7.96300 


. 125330E+07 


101.300 


685.000 


13 


104.400 


305.470 


7.28300 


. 123065E+07 


102.100 


-1081.00 


14 


106.900 


302.720 


7.00000 


. 124883E+07 


102.800 


147.000 


15 


108-600 


301.450 


7.00000 


. 126235E+07 


103.300 


825.000 


16 


110.000 


299.280 


6.75000 


. 128177E+07 


103.900 


292.000 


17 


109.600 


299.010 


6.7SOOO 


. 129405E+07 


104.200 


226.000 


18 


111.700 


299.840 


6.90400 


. 132193E+07 


104.700 


423.000 


19 


112.700 


296.840 


7.08300 


. 132389E+07 


105.600 


410.000 


20 


113.000 


292.760 


7.25000 


-132379E+07 


106.100 


13.0000 


21 


112.800 


288.170 


7.05200 


.134482E+07 


106.500 


560.000 


22 


112.800 


288.530 


6.77000 


. 135556E+07 


106.800 


637.000 


23 


114.500 


294.100 


6.77100 


.137034E+07 


107.100 


40.0000 


24 


115.100 


295.680 


7.11100 


.142249E+07 


107.300 


1188.00 


25 


115.900 


292.450 


7.00000 


. 1391 33E+07 


107.200 


-650.000 


26 


114.300 


288.270 


7.00000 


-139423E+07 


107.500 


683.000 


27 


116.000 


282.440 


6.69200 


. 142350E+07 


107.500 


860.000 


28 


115.300 


275.720 


5.87000 


.143041E+07 


107.500 


1226.00 


29 


114.800 


277.660 


5.18200 


. 143960E+07 


107.700 


85.0000 


30 


115-700 


275.640 


5.47600 


.147144E+07 


107.300 


872.000 


31 


113.500 


267.590 


5.65900 


.148676E+07 


106.800 


1494.00 


32 


116.000 


265.710 


5.75000 


.147124E+07 


107.000 


669.000 


33 


115.800 


267.120 


4.97900 


-148910E+07 


107.100 


1098.00 


34 


115.000 


261.590 


4.91500 


. 148856E+07 


106.800 


1316.00 


35 


117.300 


249.170 


4.62000 


. 1 51905E+07 


106.100 


1111.00 


36 


118.100 


241.690 


5.01400 


. 158033E+07 


105.700 


2154.00 


37 


118.700 


240.800 


4.78800 


.154040E+07 


105.600 


-266.000 


38 


119.400 


241.440 


4.80400 


. 1 54600E+07 


105.700 


1835.00 


39 


120.700 


236.630 


4.62000 


. 157332E+07 


105.600 


2402.00 


40 


121.400 


222.970 


4.14100 


. 161168E+07 


105.200 


1680.00 


40 


121.400 


222.970 


4.14100 


. 161 168E+07 


105.200 


1680.00 


41 


122.000 


225.390 


4.06000 


.161041E+07 


105.500 


634.000 


42 


122.200 


222.710 


4.10600 


. 165076E+07 


105.100 


2265.00 


43 


122.100 


205.270 


4.44200 


. 165489E+07 


104.100 


1989.00 


44 


123.600 


190.940 


4.39400 


. 165349E+07 


103.200 


1246.00 


45 


124.700 


190.920 


4.25000 


. 167462E+Q7 


103.100 


1911.00 


46 


125.400 


187.700 


4.18000 


. 167206E+07 


102.500 


393.000 


47 


125.900 


184.890 


3.93200 


. 170669E+07 


102.700 


592.000 


48 


127.300 


196.530 


4.56700 


. 178720E+Q7 


103.300 


1853.00 


49 


127.700 


196.240 


4.28800 


.172604E+07 


103.900 


-1462.00 


50 


129.300 


199.150 


4.34800 


. 1 73261 E+07 


104.800 


262.000 


51 


129.100 


203.630 


4.63900 


. 177588E+Q7 


105.700 


489.000 


52 


130.000 


711.310 


4.88540 


. 181619E+07 


107.500 


-345.000 


53 


132.200 


217.540 


5.11500 


. 1 80470E+07 


109.200 


-889.000 


54 


132.500 


219.790 


5.34380 


. 184497E+07 


110.600 


108.000 


55 


134.200 


217.250 


5.80290 


.184227E+07 


112.700 


-939.000 


56 


134.700 


216.160 


6.68520 


. 184383E+07 


114.500 


-1510.00 


57 


133.600 


220.590 


6.80980 


. 187794E+07 


116.100 


-780.000 


58 


136.400 


225.520 


6.74280 


. 185546E+07 


117.400 


-1086.00 


59 


138.200 


238.650 


7.58070 


. 188678E+07 


119.200 


-2294.00 


60 


138.500 


243.760 


8.04570 


.195013E+07 


121.400 


-308.000 


61 


140.100 


237.400 


8.05710 


. 190033E+07 


124.000 


-3372.00 


62 


146.100 


240.460 


8.73960 


. 190956E+07 


127.200 


-1250.00 


63 


142.700 


247.450 


10.7300 


. 194735E+07 


129.800 


-1188.00 


64 


144.500 


252.520 


12.2100 


. 198030E+07 


133.300 


-1784.00 


65 


143.000 


237.750 


12.5625 


. 196899E+07 


133.100 


-1861.00 


66 


142.400 


221.170 


12.6425 


. 2002 50E+07 


133.000 


-888.000 


67 


142.600 


217.950 


12.7014 


. 198973E+07 


133.500 


-951.000 


68 


137.200 


224.660 


12.0865 


. 200767E+07 


134.500 


-913.000 


69 


141.800 


218.730 


11.4036 


. 199238E+07 


134.100 


853.000 


70 


143.100 


210.240 


11.0361 


. 198972E+07 


133.100 


-17.0000 


71 


141.300 


211.280 


9.50000 


•204781E+07 


133.200 


-506.000 


72 


143.500 


212.420 


9.48840 


. 208986E + 07 


133.000 


1131.00 


73 


143.800 


203.500 


8.90760 


. 203756E+07 


132.300 


-2724.00 


74 


143.900 


203.500 


8.60330 


. 205005E+07 


132.100 


-129.000 


75 


144.200 


207.760 


8.03500 


. 208097E+07 


132.100 


777.000 


76 


144.700 


212.250 


7.18500 


.21211 4 E + 07 


132.700 


449.000 


77 


143.000 


217.510 


7.05730 


. 216326E+07 


133.800 


-382 .000 


78 


146.300 


223.990 


7.11780 


. 217792E+07 


134.400 


1388.00 


79 


147.600 


226.620 


7.25930 


.217770E+07 


135.000 


940.000 


80 


146.300 


236.070 


7.23560 


. 217886E+07 


13,5 .700 


477.000 


81 


149.400 


230.270 


7.25780 


. 219201 E+07 


135.700 


2114 .00 


82 


151.000 


229.050 


7.05050 


•220312E+07 


135 . 500 


1788.00 


83 


150.900 


230.150 


6 . 79890 


. 224106E+07 


135.300 


-1061.00 


84 


150.200 


218.180 


6.70140 


. 232042E+07 


135 . 100 


1133.00 


85 


149.300 


221.650 


6.57610 


. 228666E+07 


135.100 


-1892.00 




1 


2 


3 


4 


5 


6 
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Table 2 



LKO 



LM 



LW 



2 

3 

4 

5 

6 

7 

8 

9 

10 
11 
12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 

47 

48 

49 

50 

51 

52 

53 



- . 91 5099E -02 
- . 133678E-01 
. 235305E-01 
.0 

. 1 00604 £-01 
. 1 09509E -01 

- . 198228E-02 
.888904E-02 
. 392536E-02 

- .167905E-01 
.216757E-01 
. 173918E-01 
. 236641 E-01 
. 157776E-01 
. 1 28090E-01 
- .364307E-02 
. 189794E-01 
.891271E-02 
. 26584 2 E-02 
-. 177159E-02 
.0 

. 149586E-01 
. 5 2 2 64 1 E-02 
.692646E-02 
- . 139012E-01 
.147637E-01 
- . 605287E-02 
- .434594E-02 
. 7 809 23E -02 
- . 191976E-01 
.217874 E-01 
- . 172573E-02 

- .693233E-02 
. 198025E-01 
. 679699E-02 
. 506763E-02 
. 587987E-02 
. 108289E-01 
•578272E-02 
. 49302 1 E-02 
. 1 63798E-02 

-.818715E-03 
. 122102E-01- 
. 886036E-02 
. 559775E-02 
.397931 E-02 
. 110585E-01 
. 313733E-02 
.12451 5E-01 

- . 1 54796E-02 
. 694722E-02 
.167815E-01 



. 481369E-02 
. 1 87009E-01 
.149992E-01 
. 503559E-02 
. 1 98337 E-01 
.577297E-02 
. 979962E-02 
.731683E-02 
.639977E-02 
. 2B2948E-01 
. 387205E-01 
- . 1 82448E-01 
. 1 46687E-01 
.107656E-01 
.152677E-01 
. 953723E-02 
. 213122E-01 
. 148536E-02 
- . 778040E-04 
. 157591E-01 
.795598E-02 
. 108«472E-01 
. 37347 1 E-01 
- .221495E-01 
. 20836 IE-02 
. 207778E-01 
. 483899 E-02 
.640559E-02 
.218776E-01 
. 103557E-01 
- . 1 04 883E -01 
.120649E-01 
- . 366731 E-03 
. 202760E-01 
.395518E-01 

- . 255922E-01 
. 362689E-02 
.175 178E-01 
.240910E-01 

-.787687E-03 
.247463E-01 
. 250056E-02 
-.848148E-03 
. 1 26957E-01 
-.152928E-02 
. 205006E-01 
. 460949E-01 
-.348221E-01 
. 380033E-02 
.246643E-01 
-224449E-01 

- .634488E-02 



- . 699650E-02 
- . 402020E-02 

.201220E-02 

.0 

- . 201220E-02 
. 100660E-02 
. 501757E-02 
. 299853E-02 
.497761E-02 
.0 

. 594052E-02 
.786634E-02 
. 683260E-02 
. 485202E-02 
.5791 58E-02 
. 288326E-02 
.478699E-02 
.855919E-02 
. 472367E-02 
. 37 6303E-02 
.281283E-02 
. 280508E-02 
.186564E-02 
- . 932316E-03 
. 279463E-02 
.0 
.0 

.185871 E-02 
- . 3721 02E-02 

- .467072E-02 
.187 102E-02 
.934057E-03 

- . 280508E-02 
-.657585E-02 
-.3777 1 0E-02 
- . 946579E-03 
.946579E-03 

- . 946579E-03 
-.379501E-02 

. 284768E-02 
-.379876E-02 
- . 956030E-02 
- . B683D6E-02 
-.969521 E-03 
- . 5836 50E-02 
. 194929E-02 
. 58251 7E-02 
. 579158E-02 
. 862481 E-02 
.855121E-02 
. 168860E-01 
. 1 56902 E-01 



54 

55 

56 

57 

58 

59 

60 
61 
62 

63 

64 

65 

66 

67 

68 

69 

70 

71 

72 



226674E-02 
127486E-01 
37 1886E-02 
819987E-02 
207415E-01 
131 102E-01 
216844E-02 
114861E-01 
419349E-01 
235468E-01 
1 25350E-01 
104349E-01 
420467E-02 
140349E-02 
386037E-01 
329778E-01 
912609E-02 
126584E-01 
154498E-01 



. 220698E-01 
- . 1 46342E-02 
. 844 795 E -03 
. 183326E-01 
- . 120465E-01 
. 167428E-01 
.330223E-01 
- . 258680E-01 
.484738E-02 
. 195940E-01 
. 167804E-01 
-. 572661 E-02 
. 168756E-01 
- -639894E-02 
.897589E-02 
- .764544E-02 
-.133648E-02 
. 287770E-01 
. 203262E-01 
2 



.127390E-01 
. 1 88094 E-01 
. 158454 E-01 
. 1 38770E -01 
. 1 1 1350E-01 
. 152159E-01 
. 182881E-01 
.21 1907E-01 
.254791 E-01 
.20234 IE -01 
. 266074E-01 

- . 1 50148E-02 

-.751528E-03 

. 375235E-02 
. 746272E-02 

- . 297848E-02 

- .748506E-02 
.751078E-03 

150261E-02 

3 
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Table 3 

MATRIX-6 c 1). 

MATRIX 5X5 

1 2 3 4 5 



1 

2 

3 

4 

5 

MATRIX-B 

MATRIX 

1 

2 

3 

4 

5 

MATRIX S 
MATRIX 



2 

3 

4 

5 



-0.117330 + 01 0 . 264990 + 01-0 .31725D + 02-0 . 131990 + 03 0.92-415D + 02 
-0.355500-02-0.116640+01 0.367550+01-0.333680+01-0.271380+01 
-0.249340-03 0.218240-02 0.470580+00 0.392120-01-0.221540-01 
0.464050-04 0.386070-02-0.761650-01 0.609260+00-0.275240+00 
-0.316250-03 0.919740-03-0. 123820+00 0.457110+00 0.164450-01 
( 2 ) . 

5X5 

1 2 3 4 5 

0.224450+00-0.323310+01 0.917750+01-0.167960+03 0.518800+02 
0.489690-02 0.230430+00 0.151880+01 0 . 349760+01 - C . 938400+01 
0.186490-03-0.153710-02 0 . 46790D+00-0 . 529730-01-0 . 486170-01 
-0 .5 56 510-04-0 .288980-0 2-0 .141960 + 00 0 . 24 5430+00-0 .2 54770 + 00 
0.282900-03 0.374480-03-0.234940+00 0.235010+00-0.418590-01 

5X5 

1 2 3 4 5 

0.760640+02 0.336480+01 0.130810-01 0.260440-02 0.289330-01 

0.336480+01 0.730990+00 0 . 26507D-03-0 . 376850-03 0.117300-02 

0.13081D-01 0 - 26507D-03 0 . 16339D-03-0 . 133890-04-0 . 195930-04 
0.260440-02-0. 376850-03-0. 133890-04 0.113550-03 0.97673D-04 
0.289330-01 0.117300-02-0.195930-04 0.976730-04 C.22893D-03 
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Figures 9 'V 13 plots the predicted values and actual values. The symbol 
A denotes (long-range) predicted value with no observation, while the letter 
x denotes predicted values for which observation marked by • is available. 

The data {y }, t=l, . .., N, are also used to calculate sample covariance 
N— £ 

matrices A^, calculated by y t+ ^y^)/N/ to construct the Hankel matrix. 

Its singular values and the eigenvectors to construct U and V, needed is the 
singular value decomposition, are also calculated. By inspection of the singular 
values, n = 10 seems to be the largest useful dimension of approximate innovation 
models and the matrices A, C, and M were calculated accordingly following the 
procedure of Section 9.2. Visual inspection of the singular values reveals that 
n = 1, 2 and 6 could also be possible dimensions of the state space models. 

As examples, we choose n to be 2 and 3 with N = 74 and 84 and numerically 
solve the algebraic Riccati equation to construct an approximate innovation model 



t+1 



Az t + Te t , 

Cz t + v 



where 



E Vt' 



- 1 , 



Z = AZA’ + (M - AZC)(A 0 - CZC 1 ) (M - AZC’)', 

K = M - AZC' , 

£ = A q - CZC ’ , 

r = kz -1 , 

following the procedure of Section 10.3. The model is then used to predict 
out-of-sample y's by first calculating z N+1 | N from z t+1 | t = (A - rc ) z t | t _ 1 
+ Z Q \_ 1 = 0' t = 0/ ..., N. Then the predicted values are generated by 



Y N+k I N Cz N+k I N CA 



k-1 



n+i|n 7 



1 , 2 , 



The covariance matrix A^ with 74 data points is shown below 
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1.00 


0.02 


i 

o 

H 

CTi 


-0.09 


0.17 






1.00 


0.27 


0.09 


-0.03 


A = ! 
0 






1.00 


-0.03 


0.06 










1.00 


0.01 












1.00 


With 84 data points , the 


matrix changes 


into 








1.00 


0.03 


-0.15 


-0.10 


0.16 






1.00 


0.27 


0.08 


-0.02 


A = 
0 






1.00 


-0.03 


0.06 










1.00 


0.01 




l 








1.00 



The elements vary in the second places below the decimal points, showing that 
the number of observations is too small for the matrix elements to have converged. 
In the first example, N is 74, i.e., the data from January 1975 to March 1981 
have been used to construct the Hankel matrix. The largest reduction in its 
singluar value occur from 0^, to 0 ^ the next largest drop arise going from a 
to Oy The dimension of the approximate model is taken to be two. The matrices 
A, C. M are calculated to be 



-0.15 0.18 

A = 

, 0.02 0.12 

-0.14 0.14 -0.03 -1.08 

^ 0.42 0.60 2.14 0.96 



0. 


.01 


-0. 


.10 


-0. 


.86 


0 . 


.10 


-0. 


.23 


0. 


.91 


-0. 


.06 


-0. 


.08 


0. 


.47 


0. 


,36 



-0.55 

/ 

0.30 
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The solution of the Ricatti equation is: 



Z 



1.62 



1.30 

1.93 



In the second example, N is taken to be 84, i.e., the data from January 1975 
to January 1982 have been used also to construct the two-diemnsional innovation 
model. Its matrices are 



A 



M = 



-0.16 

0.38 



-0.15 
^ 0.04 
0.06 
0.53 
0.00 



C 



- 0.88 



-0.18 



0.18 

0.16 

-0.09 

2.35 

-0.09 

0.16 

0.92 



-0.99 -0.51 

0.88 0.25 



-0.07 -0.07 

^ 0.45 0.33 



The Z matrix becomes 



Z 



' 1.41 -1.14 

2.56 



When the current account is added as the sixth element, its correlation 
with the other five elements are: 



0.03 
0.04 
-0.46 
0.47 
< “0.57 

Using the data January 1975 to March 1981, we construct the three-dimensional 
innovation model as the third example: 
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o 

o 


i 

o 

o 


vD 

O 

O 

1 


A = 


0.03 


0.06 


0.09 




^ 0.44 


0.11 


0.16 


0.33 


0.02 


-0.38 


0.34 


•0.24 


-0.22 


-0.78 


-0.52 


0.08 


0.09 


-2.51 


-0.07 




^ -0.03 


-0.10 


-0.11 


i 


-0.59 


0.55 


-0.40 




0.30 


0.96 


0.32 


c = 


-0.08 


-0.03 


-0.08 




0.58 


0.08 


-0.38 




l 


-0.59 


-0.47 


0.00 



The Z matrix is given by 

f 3.06 -11.95 -25.28 



Z 



49.52 105.72 

228.96 



1.01 


0.56 


-3.38 


-1.20 


-5.63 


-2.81 



The two-dimensional innovation model with N = 74 has, by construction, the (2 x 2) 
upper left hand corner of A, the first two rows of M, and by the first two 



columns of C. Its Z matrix becomes Z = 

fourth example. The fifth model uses N 
time series. 



This is the 



3.50 -12.72 

51.03 

84 with the five -dimensional vector 



These models have been used to calculate the next 10 data points. The 
models based on data of January 1975 to March 1981, thus calculate values for 
the period of April 81 through January 1982. The models based on the data of 
January 1975 through January 1982 predict values for the period of February 
1982 through November 1982. 
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The models track trend growth paths of the Index of production, money stock and 
WPI fairly well. The predictions settle on neraly constant values for the ex- 
change rate, the call rate and for the current account in the third and the fourth 
model. With 74 data points, the three-dimensional model with the six- dimensional 
data vector gives the smallest predicted value for the Index of production and 
WPI. The predictions of the two-dimensional model with the six-dimensional data 
vector lie in the middle, the highest predictions being generated by the two- 
dimensional model with the five- dimens inal data vector. The latter two models' 
predicted money stock nearly coincide. The first model gives the lowest predicted 
money stock. Numerical values are gives in Tables 4 and 5. 

Table 4 

Lowest predictions (dim y = 6, 3-dim model) 





Production 
Index % 


MX10 -4 


WPI % 


4/81 


144.8 


213.3 


131.4 


5/81 


145.5 


215.3 


131.7 


6/81 


146.3 


217.3 


132.2 


7/81 


147.0 


219.3 


132.7 


8/81 


147.8 


221.3 


133.2 


9/81 


148.5 


223.4 


133.7 


10/81 


149.3 


225.4 


134.2 


11/81 


150.0 


227.5 


134.7 


12/81 


150.8 


229.6 


135.2 


1/82 


151.6 


231.7 


135.7 





Table 5 



Highest predictions (dim y = 6, 2-dim model) 





Production 
Index % 


MXlo ” 4 


WPI % 


4/81 


145.3 


214.1 


133.0 


5/81 


146.0 


216.1 


133.4 


6/81 


146.7 


218.1 


133.9 


7/81 


147.5 


220.1 


134.4 


8/81 


148.2 


220.1 


134.9 


9/81 


149.0 


224.1 


135.4 


10/81 


149.8 


226.2 


135.9 


11/81 


150.5 


228.3 


136.4 


12/81 


151.3 


230.4 


136.9 


1/82 


152.1 


232.5 


137.4 



Predictions of the exchange rate and the call rate settle down to 250.4¥/$ 
and 7.38% quickly in all the models. The current account seems to settle on the 
level 139. 8M$. Table 6 list predicted values models. 



Table 6 



N 


dim y 


dim z 


EX 


CALL 


CUA 


74 


5 


2 


250.4 


7.38 




74 


6 


3 


250.4 


7.38 


139.8 


74 


6 


2 


250.4 


7.38 


139.8 


84 


6 


2 


247.8 


7.34 


178.4 


84 


5 


2 


247.8 


9.34 


139.0 
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The values of C ^ ranges from 0.85 to 0.96. The element in magnitude 
is about 0.86 ^ 0.88 for the two-dimensional model and is about 0.6 for the 
three-dimensional model. These figures indicate that the exchange rate and the 
call rate or something close to them are being picked as the two state vector 
components in the two-diemnsional models. When the current account is added, 
the exchange rate's influence on the first state vector component diminishes. 
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Fig* 11 



t * oeSEUvEC 1 . < ' -PREDICT! D . f 1- IE * AND * Oft *COlNSiM 
* -LOKG HANf,E fftKCASTlMi 





176 



Fig. 12 
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MATHEMATICAL APPENDICES 



A. 1 Solutions of Difference Equations 



A difference equation generates a splution or an output sequence, {y n ) , 

from another sequence Cx^, called input or forcing terms. In general some 

regularity conditions need be imposed to ensure the uniqueness and stability 

of the solution sequences. Any sequence of exogenous variables can serve as 

an input sequence. Often, predetermined or lagged endogenous variables appear 

as parts of the forcing terms. A given input sequence {x^} is transformed into 

the solution sequence {y^} via the relationships embodied in the difference 

equation. We consider only linear difference equations where the transforming 

relations are linear: Suppose the system is initially at rest.* A constant c 

times (x }, i.e., the input sequence {cx } then produces the solution sequence 
n n 

{cy }. if tw o input sequences {x^} and {x n l respectively generate (y n ) and {y^} 
as the solution sequences, then the sequence ix^+x^J- produces iy n +y n l as the 
corresponding solution sequence. This is known as the superposition principle 
for linear systems. 



Solution of Linear Difference Equations 

We usually write difference equations with the input functions on the right- 

hand side, all others on the left. The scalar difference equation a(L)y t = x^, 

where x, and y^ are real numbers and a(L) = 1 + a_L + ... +a L^, is one such 
t t 1 p 

example where x is the forcing term or input function at time t. The right-hand 
side may take a more complicated form such as £>(L) x t f° r some polynominal $(L), 
the point being that the right-hand side (RHS) depends on the known function x 



while y is to be solved for. 



* No input sequence is present and only { 0 } appears as the solution sequence, 

i.e., x =0 and y =0 for all n > 0. 
n n — 
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Solutions of linear difference equations are made up of two parts; the 
homogeneous part, i.e., solutions when the right-hand side of the equation is 
zero and of the inhomogeneous part, i.e., solutions due to a non-zero input 
sequence. The former arises if the system is not at rest, and is called the 
zero-input solution. The latter is called the zero-state solution in the system 
literature. Several solution methods are available. We discuss a transform 
method and a direct i.e., a time-domain solution method. 

1st Order Equation: |A| < 1 ' 

An example will illustrate our procedure. Suppose that we have a first 
order difference equation, i.e., an equation with a single lag term 

(1) (l-AL)y t = x fc . 

Its homogeneous part is (l-Alhy^ = 0. The expression y = A^c is a solution 
where c is some constant. We can verify that it is a solution by substituting 
this expression back into the equation: (l-AL)A t c = A fc c - AA** = 0 . To fix 

the constant, we need to specify the value of the solution at a point, usually 
at the initial time t = 0, i.e., Yg is given as an initial condition. With 
this initial condition, the solution is unambiguously determined to be y^_ = A^y^. 

Alternatively, the solution may be fixed at another time such as a terminal time 

(t— T) 

T. This terminal or boundary condition fixes the solution to be y = A y . 

t 2 T 

These two expressions are equivalent. Knowing y T we can determine y Q and con- 
versely. Let y^ = 0. The solution with the non-zero input is given by 
t t-1 

(2) y = Z A x = Z A S x 

T=1 T 0 t_S 

t— 1 t— T 

To see that this satisfies the difference equation, rewrite y, as y =Z , A x 

2 t 2 1 T=1 T 

+ x , and note that the first term can be written as Z t ^ X^ T x = A(Z t ^ A** ^ T x ) 
t T=1 T T=1 T 

where the expression in parentheses is recognized as y^ 
tion from (2) establishing that y^ = x^_ + Ay 



from the supposed solu- 
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The general solution is then made up of these two solutions 

(3) y = A^y + E T x or A fc y + E A S x 

t T . t-s 

T=1 S=0 

The first part is the zero- input solution and the second zero-state or zero- initial 
condition solution. 

As t approaches infinity, the zero- input part of the solution remains bounded 
for y^ ^ 0 if and only if |A| <_ 1. With the bounded exogenous sequence, i.e., 

|x | <_ M for all T and for some M, the second term remains bounded as t goes to 

infinity if and only if | A [ <_ 1. Such a system is said to be bounded- input 
bounded-output (BIBO) stable. If | A | < 1, then y 0 as t -*■ 00 for all bounded 
input sequences. The system is then called asymptotically stable. With | A | >1, 
y Q must be zero for y^ to remain bounded. Even when | A | >1, the second term 
can remain bounded if |x | goes to zero sufficiently fast. 

Formally, the solution can be obtained by writing the original equation as 

1 . —1 00 T 

y = — x . Expand (1 -Al) as an infinite sum E (Al) assuming that this 
t 1— Ai-i t T=0 

sum is finite. Then we can write y^_ as 

oo 

y = £ X T L T x 

T— 0 

t-1 °° 

= £ A t lY + 2 A t l t x 

T=0 T=t t 

t-1 00 

= £ A T x t + A 11 £ A T-t x 

^ ^ t-T t-T 

T=0 T=t 

t-1 00 

= E A x + A { E A S x }. 

t-T -s 

t= 0 s=0 

If we identity the bracketed expression in the second term as y^, i.e., y Q = 

CO g X t 

^ s=0 ^ x_ g , then we can write y fc = E^ =q A x fc _ T + A y . This is exactly the 
solution we obtained earlier. The above manipulation is legitimate then if 
{x } is a bounded sequence and if | A | < 1 . As a special case, suppose x = a 
for all t. Then (1 -Al) ^a = (^A_q A^)a = ( 1— A) ^a if | A | < 1. The solution 
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becomes y *= a/(l-A) or y = X T )a + A t y r = (1-X t ) a/(l-A) + A^y , where 

y Q = a/ (l-X) . 

The expansion into an infinite series implicitly assumes that the values 
going back to are available. When the series starts up from a finite past 
point, the value there can be used as initial conditions. 

Transform Methods 

The procedure of the previous section for solving difference equations by 

formal expansion' of (1 -Al) ^ is related to the solution method by z~ trans forms . 

(The z-transform is the difference equation counterpart of the Laplace transform 

method of solving differential equations.) The z-transform of a sequence {x , 

n 

i 00 — n 

n > 0/ is defined to be X(z) = Z ^ x z . This is a formal series in which z 
— n=0 n 

serves as a place marker. By examining the coefficient of z ^ we can identify x^, 

for example. A brief discussion of z-transforms and its relations to the lag 

and Fourier transforms are found in Appendix. 

Denote the z-transform of the solution (sequence) by Y(z), i.e., Y(z) = 

^n=0 Y n Z ' T ° obtain the z-transform of {Ly n }, we define another sequence 

(h ) by h = Ly = y . Its z-transform is H(z) = E . h z _n = E°° ^ y . z~ n 
n n n n-1 n=0 n n=0 J n-1 

co _ m - 2^ _ 2^ 

= y_^ + y^z = y + z Y (z) . Consider the difference equation (1). 

We have just shown that the z-transform of the sequence {(l-XDy^} = {y^-Xy^ 
equals Y(z) - A{y +z ^Y(z)}. This must equal the z-transform of the RHS of 
the difference equation, i.e., X(z). Equating the two, the resulting equation 
Y(z) - Xy_ x - Xz“ 1 Y(z) = X(z) can be solved for Y(z) to yield 

Y(z) = Xy_ 1 /(1-Az _1 ) + X(z)/(1-Az _1 ) . 

By examining the coefficients of the power z 1 on both sides we can obtain 
the expression for y^_. On the left-hand side (LHS) , it is simply y^_ by the 
construction of the z-transform. From the first term of the RHS we get Ay A fc 
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= A t+ ^y . The second term of the RHS is (x^+x^z x +x 2 z •••) X+ A^z 

-t ,t . . t-1 



-1 ,2 - 2 , 



. ..). Collecting the terms of the power z yields A x^ + A x^ + ... + x^_. 
Hence we recover the solution we earlier obtained by this method as well: 



A t+1 y + I A X x 

" X T=0 t - T 



.t t ~ 1 .T 

= A y Q + I A x t _ T , 

where y^ = Ay ^ + x^ has been substituted out. 
wish to have well-defined z- transforms such as 
the complex variable and suitably restrict the 
are convergent. 



As in the earlier method, if we 
X(z) or Y(z), we can regard z as 
domain of z over which the series 



1st Order Equation: | A [ >1 

Ordinarily, we solve difference equations forward in time from some initial 
time. In economics, however, we often want to solve difference equations back- 
ward in time relating the solution values to a future time instant such as the 
end of a planning horizon as we do in dynamic programing. Our earlier method 
treated time as flowing forward, i.e. , knowing y^ we determined y using values 
of input sequences x^, x^, •••» t > 0. When we specify a value of the solu- 

tion at some future time T > 0, we are solving the difference equation backward 
in time to obtain y , t < T, from the specified boundary or terminal condition 

v 

The zero- input solution of (l-AL)y t = x is 

(4) y t = ^ t_T y T = (i/A) T_t y T - 

We see that y fc goes to zero as T ->■ » if |A| > 1. If | A ] < X , then y diverges 
as T approaches infinity. To obtain the zero-state or zero-initial (now zero- 
terminal) condition part of the solution, we measure time backward from T by 
changing the time variable from t to s = T - t, and rename the variables: 
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h = y , u = x . The difference equation in these new variables is h - 
s T— s s T— s s 

Ah = u or h , - (1/A)h = -(1/A)u . This is the type of equations we 

s+1 s s+1 s s 

discussed in the previous section because 1/ | A | < 1. We can write its zero- 
input solution as h g = (1/A) s hg. In the original variables we recover Yip s = 
(l/X) S y T or y = (1/X) T . Its zero-state solution is h g = - (1/A) (1/A) 

U s-i 1’* Convert i n 9 back to the original variables, this expression becomes 
y T - s = -(l/A)2/_ 0 (1/A) x t+ ^ -s+ ^* Renaming the time variable T-s by t, we 
rewrite this solution as 



(5) 



T-l-t 

v , l\ n 

( " I 5 X t+n+l’ 

n=0 



It shows that y^ is affected by the current and future inputs, x , 

rather than by current and past inputs as in the previous section. 

The general solution combines (4) and (5) : 

, T-l-t 



■/ X m 



(6) 



l N T-t 



( a> 



i <-i )ix t + i + r 

i=0 



(Of course (6) can be directly obtained by iterating backward from t = T.) This 
form of the solution relates the current value of the solution y^_ to the terminal 
value y T and the exogenous (input) variables that will occur between now, t, and 
the future, T-l. Comparing (2) with (6), we note that ) 1/A [ rather than |A| must 
be less than 1 if y fc remains bounded as f <». By letting T approach infinity 
(5) becomes 



(7) 



,i» y ,1%S— t ,1» v ,l\i 

y t " A s ^ t A x s+l _ _( A ) / o ( X ) x t+l+i' 



Formally, this form of solution can be obtained from the original differ- 
ence equation without these changes of variables by expanding (1-AL) ^ not as 
the power series in (AL) by rather as a formal power series in (AL) -1 : (1-AL)' 



* Compared with (1), the time index of u is off by one because (l-AL)h 
-(1/A)u g rather than (l-AL)h g = -(1/A)u g . S+1 




184 



— (XL) “ 1 {l— (XL) 1 }“ 1 which becomes .(XL) -1 . Then the solution of (l-XL)y 

1 — JL T- 



x is written as 



(7') 



Y t = (1-XL) 1 x t 



= - E (Al) x 
i=l 

CO 

1=1 

because L ^ now stands for a forward shift of time index; L x^_ = This 

expression is exactly (7) . 



By breaking up the infinite sum as 



(8) 



T-l-t . 00 , . 

, Z n <X )lx t+l+i ' ( X>. ^ ( A )lx t + l + i 

1=0 i=T— t 



and rearrange the second term as 



,1. v ,l,i _ ,1, v ,1. T-t+ j 

-( x ) Z ( x ) x t+1+i _ A A X T+l+j 

i=T-t 1=0 



.1. T-t 
( A } Y T 



we see that (7) can be put as (6 ) , provided the infinite sum is absolutely 
convergent (for example, x' s being bounded and |A| >1). The z-transform 
method of solving (1) for |X| >1 involves the same sort of manipulations. 



2nd Order Equation 

We can solve higher-order difference equations for scalar variables in 
several ways. The most systematic and theoretically satisfying way is to convert 
them into first-order difference equations for suitably constructed vectors. 

These vectors are the state vectors (of the system governed by the difference 
equations in question) . We can then appeal to a body of linear system theory to 
obtain insight into solution behavior. Because this tack requires some know- 
ledge of system theory (as summarized in Aoki [1976; Part Ij , for example) , 
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which is probably not familiar to the economic profession, we first proceed as 
follows, using z- transform or Lag-transform to tackle a second-order difference 
equation. We wish to solve 
(9) a(L)y t = x t 

where 

a (L) = 1 + a L + a 2 L 2 
= (1-A^L) (1-X 2 L). 

Formally (9) leads to the solution 



Expanding 



(1-AjL) (1-A 2 l) 



Y t (l-^L) (1-A 2 D x f 

into partial fraction. 



1-A^L l-A^/A^ ' 



y can be expressed as 



i-V*! 1 ‘ X 1 L 1 - A 1 /A 2 1 - A 2 L 



Now if |AjJ, |A 2 | are both less than one, then our solution of the first 
order equation (2) immediately leads to the solution of (9) 



£ A_x^_ . + 



£ Xtx, . . 



i=0 1 ^ 1 - X 1 /X 2 i^O 2 t “ i ‘ 



CO _2_ CO _J_ 

Now, by identifying y Q ^ with £^__^ (l-A^/A^) ^i x ± and y o2 ^i=l ^ 1 ”^l // ^2^ 

A^x we can write the above equivalently as 



( 10 ) 



y t - h i X t-x + W + X 2 y 02 
1=0 



where 

h i = ( 1 - A 2 /A 1 )_1;V 1 + ( 1 -A 1 /A 2 ) _1 A 2 . 

This expression corresponds to (3) . We note that we must now have two conditions 
to fix the two constants (initial conditions) y and y . ^ey are written 
here as two components of an initial condition vector. 



In (10), the zero-input solutions are represented by the last two terms. 
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Because (l-X^L)X^ = 0 and (1-A 2 L)A 2 == ® r we have no doubt that (1-A^L) (l-X^L) • 
t t 

^ C l^l +C 2^2^ = ^ ^° r C 1 C 2‘ Zero ” input solutions hence general solutions 
of second-order equations need two conditions to fix the solution uniquely. A 
specific solution is picked by fixing the solution sequence- at any two points; 
one condition could be specified at t = 0 and the other at t = T (some future 
or terminal time) . 

The sequence {h_^} is the impulse response sequence which represents the 
dynamic (i > 0) and impact (i = 0) multiplier effects of the exogenous variable 
x on y. If both |A | and | | are greater than one, then our solution (6) or 
(7) suggest that we write l/a(L) as 

( V' ) ~ 1 1 

/X 2 1-(A 2 D -1 -' 



(I-AjL) (1-A 2 L) 



-{rat 



(AjL) 



■I 



2 /A l 1-(A,L) -1 1_A 1 



so that y is formally written as 



= -‘VV 1 . 1 <i: )ix t + i + i 

i=0 1 



(A 2 -Ai) 



i=0 A 2 



X t+l+i * 



Or if we wish, we can break up each of the two infinite sums into two parts as 
in (8) and write the above as 



.1 N T-t . .1 . T-t . ,-l v ,1 . i 

^t Y T1 + ( X 2 } y T2 “ (A l" A 2 ) ± f 0 X t+l+i 



- ( X -A J" 1 2 ( ^. )±x 

1=0 2 



Again two constants need be specified. They are expressed here as two components 
of a vector specified at the terminal time T„ 

Suppose |Aj < 1 < |X 2 |. Then it is sometimes convenient to express y 



(l-X 1 L)y t = x t /(l-X 2 L) 



T-l-t . , m 

1 y 1 , ,1 N T— t 

_X 2 .f Q A 2 X t + l + i + C( A^ 
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where c is to be determined by a condition specified at T. Or letting T -*• 00 , 
we have 

00 

(l-\L)y t = -A " 1 £ 

1=0 



Then y is expressible as 



Y t “ Vt-l - X 2 1 . S n X 2 lx t + X + i‘ 

1=0 

This form is useful if y t _^ -*- s ^ nown - Then is determined by the future stream 
of x's. When x is stochastic, this form or slightly altered version of it often 
appears as the one-step ahead prediction formula of (now random) y^, given y t 
Such examples are found in Sargent [1979] and elsewhere in this lecture notes 
as well. 



State Space Representation 

We now solve (9) using state space. Define a vector s^ by s^ = (y^, y ) ' . 
The second-order difference equation (9) then is equal to 



*t-l 



- a l -a 2 



t-1 
^ y t-2 



( 11 ) 

where 



s t = As t-i + bx t 



~ a l 


" a 2 1 


and b = 


1 ' 


, 1 


0 J 




0 



The characteristic polynomial of A, [ Al— A | , is a second-order polynomial in A, 
^ 2+a l^ +a 2 ‘ Its roots are the eigenvalues of A which are exactly given by X 
and A 2 we have earlier used to factor the lag polynomial a(L) of (9). Now (11) 
produces a vector version of the 1st order difference equation in lag-operator 



form 
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( 12 ) 



( I-AL) s t = bx t . 



Here AL is understood to equal 



-a^L ~a^L 



. This corresponds to (1) we have 



earlier discussed. Eq (12) is formally solved as s^ = (I-AL) 'Sdx^. From the 
matrix identity (I-AL) ^ = ad j (I-AL) /| I-AL | where |i-Al| = l+a^L+a^L^ = (1-X^L) ' 
(l-X^L) , we can write s^ as 



t (1-^L) (1-X 2 L) 



t. 



Componentwise, this is nothing but y^ = [(1-X^L) (1-X 2 L)] ^x and y ^ = [(1-X^L)* 
(l-X 2 L)] - 1 Lx t . 



The solution of (1) can be put then as 

t 



(13) 



t t— T 

s, = A s^ + E A bx 

0 T=1 



t 1 T 

= A s Q + E A bx . 

T=0 



Suppose that A has two linearly independent eigenvectors u^ and u^ so that 
A[u , u 2 ] = [u 1# u 2 ]A where A = diag (X^ X 2 ) . 

-1 



Define v^ and by [u^, u 2 ] 



V i 



as 



V 2 



. The matrix A is then expressible 



A = [u r u 2 ] 






V 2 



£ X . u. v! . 
i=i 1 1 1 



The vector v^ is the left row eigenvector, vpV = X_^v^. This is an example of 

the spectral decomposition representation discussed in Appendix. We note that 

v!u. = 6. . from the construction. Because of this, the power A n has E 2 „ X n u.v! 
1 3 1=1 111 

as its spectral representation. Then (12) can be written as 



2 t-1 2 

s = E X.u. (v!s ) + E E xTu. (v!b)x 

t i=l 1 1 1 ° T=0 i=l 1 1 1 fc - T 
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We note that is absent from if v^s^ is zero. If v^s^ is zero, then A^ is 
absent from the first summation. We see that the initial condition vector s^ 
which is orthogonal to v_^ does not excite the i-th term. This observation 
generalizes to n-dimensional problems. We refer to the i-th term as the i-th 
mode of the dynamic system. 




190 



A. 2 Geometry of Weakly Stationary Stochastic Sequences 



Economic agents often face a noisy environment and must extract informa- 
tion useful for his decision problems from it. Totality of mean- zero random 
vectors with finite variances can be made a Hilbert space by defining an inner 
product of two members x an y by (x, y) = Ex'y. Two random vectors are ortho- 
gonal if the inner product is zero, i.e., if they are uncorrelated. 

Suppose y is observed which is related to a basic or elementary random 

2 

sequence, e , Ee = 0, E£ e = a 6 , by a moving average process 

t t t s t , s 



.(1) 

where 



4>(L)e t 



<j>(L) 



2 <f> L 3 , <j> = 1. 

j=0 J 



For this to generate a covariance stationary process, we assume that ^j_Q + s 
finite. 

't r 

Let us consider predicting y^ t , m > 0 based on information set I = iy, , y. . , 

t+m t: t: c- 1 

y , ... }, as a typical or one of the basic problems of information extraction 

faced by economic agents. We consider only a linear prediction formula in which 

y is estimated by a linear combination of current and past y's. 
t+m 

Here we assume that <j) (L) is invertible in the sense that <j)(L) ^ = iJj(L) exists 
°0 -i °° 2 

such that ^(L) = ^j_Q , ^j-o ^ °°‘ In °^ er wor ds, the {y^.} Process of 

(1) can equivalently be expressed as an autoregressive process 

(2) >P(L)y t = e fc . 



A simple example y = £ - ^ shows that not all moving average processes are 

invertible. However, invertible processes do constitute an important class of 

Or 

stochastic processes. For such processes, the information set I is equivalent 



to the one containing = {e t , £ t , — }. Then a linear prediction of y 



t+m 



based on 1^ is of the form 
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y t 4»it = A(L5£ t 



where 



a 0 + a i L + a 2 L + •••• 



A(L) 

The best such prediction is, by definition, the one that minimizes the prediction 
error variance 

°y = E[(y t + m- y t+m|t )2 | V 

= E[{<!>(L)e t + m- A(L)e t }2 | V- 

Separate out the future e's from those in the information set I by writing 

m-1 00 

<J) (L) £ = E <().£., . + E <j>.e . 

t+m j _ 0 J t+m-j Y j t+m - 3 

m -1 

= E <(>.£., • + <f> (L)e. , 

1 t+m - 3 m t 

3=0 



where 



(L) = I <f> . L 1 . 

\ . n m+i 

i=0 



Then 

_ m 1 p 

0 = E <j> cr + [{<j> (L)-A(L) }e ] . 

y . _ j m t 

3=0 

2 

The choice A(L) = (L) clearly minimizes the 0 , i.e., 

m y 



(3) 



y t+m 1 1 = t (L)£ t 



is the best least- squares predictor of Y t+m based on information contained in I t * 

Sometimes a notation or an (annihilator) operator [“] + is used to express 

<P (L) as [$(L)/L m ] where £ * ] , collects only non-negative powers of L dropping 
m + + 

all expressions with negative powers of L. Then we can write the best l.s. pre- 
dictor as (L) /L m ] Non- negative powers of L refer to current and past values. 
Recalling that z ^ corresponds to L, we can equivalently define [ ] as non- 

negative powers in z ^ for any expression expressed as (formal) power series 
in z \ For example, we write 
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where 



and 



CO 

F(z) = 2 f z 

™ n 



[F(z)] + + [F (z) ] 



[F (z) ] 



E 

n=0 



[F (z) ] = 2 f z n . 

_ -n 
n=l 



The operation [ ] picks out causal (i.e., realizable) portion of the transfer 

function [ ]. When basic or fundamental noise sequences are involved, [ ] + 

realizes the orthogonal projection onto the subspace spanned by the data. 

Examine further the expression in (3) . The variable to be predicted Y t+m can 

be written as <f>(L)L because L m e is £ . The best predictor, [d)(L)/L m ] e, 

t t t+m + t 

drops from d>(L)L m e^ the random variables E^ , E,_ . Because , 

* T t t+m t+m-1 t+1 t+1 

e. , are uncorrelated with e's in the information set I . the dropping of 
t+m t 

these uncorrelated random variables is equivalent to taking orthogonal projection 
of them onto the subspace spanned by e's in I . The operator [({> (L) /L m ] + is the 
orthogonal projection operator. Alternately put, the best predictor Y t+m | t i s 
such that the predictor error Y t+m - ^t+m|t or ^°9 ona l to (uncorrelated with) 
all the e's in 1^, or by the equivalence of the subspace spanned by e^_, e^_ 

— with that spanned by y fc , y t _^, — / y t+m”^t+m| t ortho 9 onal to Y t _ T * T = 

1, 

This fact is sometimes referred to as the orthogonality principle: Consider 

a collection of random variables x_ , . . . , x which are used to estimate another 

1 n 

random variable y in the mimimum mean square sense, i.e., 

n 2 

Min (y-Ea.x. , y-Ea.x.) = Min E(y- E a.x.) . 

{a.} 11 11 { a .} i=i 1 1 

Examining a slight change in a^ from its optimal value a^, we note that the co- 
efficients are optimal if and only if 

n 

0 = (y - E ax, x ), j = 1, ..., n. 
i=l 3 




193 



In words, the optimal estimation error, y - a?x^, is orthogonal to every 
vector x^, ..., x^. See Achieser [1956, Chapter 1], for example, for further 
discussion. 
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A. 3 Principal Components 

Definition The principal components of a p-dimensional random vector x with 
mean zero and covariance matrix X are defined as linear combinations of the 
components of the vector x 

v = r*x 

where T is the matrix made up of the (normalized) eigenvectors of X, T = [y^, 

Y_, ..., Y ], i.e., 

2 p 

x r = rA, r*r = i 

where A = diag (A^, . .., A^) . 

The i-th component of the vector v, v_^, is called the i-th principal component 
of x. 

This definition makes clear that the principal components are the coordinates 
of x with respect to the basis composed of the eigenvectors of Z, because x = 
implies that v = T'x = = C if FT = I, i.e., v is the representation of x 

with respect to the basis V. 

The principal components are orthogonal because 

cov (v) = r-xx-r = r*xr = x. 

The variance of v^ is the i-th eigenvalue of Z. From tr Z = A^, the sum of 

all the variances remains fixed. 

Optimality Properties 

Principal components possess several optimal properties. For example, Rao 
[1964] discuss them. To illustrate, consider 

Max (var h'x: h'h = 1). 

Because var h'x = h'Zh, the choice of h to be the normalized eigenvector of I 
with the largest eigenvalue A^ achieves this maximum . If the sum of the remain- 
ing eigenvalues y + . . . + y is negligible compared with y , then most of the 
z p 1 

variance of x is explained by y^x, i.e. , by the first principal component of x. 
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If not, we can continue the search for a small number of variables that account 
for most variance of x by choosing h'x to satisfy 

max (var h'x: h'h = 1, h'£Y^ = 0) • 

The condition h'Ey^ = 0 means that h'x is uncorrelated with Y^x. 

Because y^ is orthogonal to Y-jy Y^ x achieves the maximum. 

Alternatively, the spectral decomposition of £ 

P 

£ = £ A.Y.Y! 

. , i i 
i=l 

shows that is the contribution by the first principal component and so on 

in explaining the total variance to £. 

Another optimization problem that is solved by the principal components is 

to approximate £ by another p x p matrix B of rank < p. 

Min II £ - Bll 
B 

where II £ - bII ^ = tr (£ - B) (£ - B') ', rank B = q < p. 

Noting that T introduced earlier is orthogonal 

tr (£ - B) (£ - B) ' = tr IT' (£ - B)ir' (£ - B) ' 

= tr r« (£ - B>rr' (£ - b) ' 

= tr (D - G) (D - G) ' 

where G = F'bF where rank G = rank B because V is nonsingular. 

Then from the relation 

ll£ - Bll 2 = £ (A. - g. .) 2 + £ £ g 2 . , 

. _ l i] 

i=l i^d 

the above expression is clearly minimized by choosing 
g ii = A ± , i = 1, ..., q. 



i.e 



X.y.yl . 

i ■ i ' i 



o, i ? j. 



We next show that the internally balanced model of Moore [1976] is an appli- 
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of the principal components. 
Let 



and 



C = [B, AB, ...], 



t 



u 



e 

t 



e 



t-i • 



Then the zero-state solution of a linear dynamics excited by a sequence of 
random impulses is 

y t = Cu* 



where we choose 



EuV = I. 

The output covariance matrix is 

Ey t y^. = CEtuV^e' 

= CC* = G : nx n , 
c 

where G c is the controllability grammian. 

Let r be the matrix made up of n normalized eigenvectors of G c 



g r = rz 

c 

2 2 

where Z = diag ( 0 ^, . . . , CJ ) . 

The vector r*y is the vector of the principal components, 
v = r-y t . 



or 



For example, 



and 



y = Fv = Z Y.u. . 

the first component of v is expressible as 



V 1 = ' 



cov v x = Y^CC'Y 1 



-4 
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A. 4 Fourier Transforms 



Preliminary Notions 



A periodic function of time indefinitely repeats a basic pattern defined 
over a finite interval such as [0, T] or [-T/2, T/2] , i.e., x(t+T) = x(t) where 
T is called the period. 

Any periodic function has a Fourier series expansion 

oo 

V j27tot/T 

(1) x ( t) = E c e J 

n=-°° 

provided the integral exists for the Fourier coefficient 

/ _ v 1 f T//2 ... - j2TTnt/T 

(2) c = - x (t) e dt 

J -T/2 

and E I c I 2 < °°. 

1 n 1 

Here the basic pattern is taken to be given over the interval [-T/2, T/2]. If 
the pattern is defined over 10, T] then, integrate over [0, T] to define the 
expansion coefficient. 

More generally, the Fourier Transform of any function x(t) is defined by 



(3) 



X(w) 



r 

J _oc 



x(t)e- jt0t dt 



J — oc 



if the integral exists, for example if |x(t) | F dt is finite for 1 < p <_ 2. 



The original function is recoverable from the Fourier transform by 



(4) 



x (t) = 



27 r 



X(w)e ja)t du). 



This is exact if x(*) is continuous. The right-hand side defines {x(t+0) 

+ x(t-0)}/2 if x ( * ) is discontinuous at t. Comparing (1) with (4), X(w) is 
seen to correspond to the Fourier series expansion coefficient. This corre- 
spondence can be made plausible by the following arguments: Use the segment of 

x(t) over [-T/2, T/2] to construct a periodic function x (t) . Its Fourier 



series 
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expansion is as in (1) , i.e.. 



x T (t) = l c n (T)< 



o (T) = i x(t)e-i“/ T df 

n J -T/2 



Let X T (2TT n /T) = Tc (T) , and denote 2TT/T by Aw. Using these symbols, (5) 



is written as 



*r<t> = ^ E X T (2 1 :„/T)e^ Tnt/T , 
n=-°° 



which has a form of series approximation to an integral. Letting T 00 and 



assuming the convergence of the infinite sum 



Aw £ X T (27Tn/T)e 



f cx 



Xfwje-^dw, 



we see that as T approaches infinity 



x T (t) + x(t) = X (co) e jWt dw. 

J _oo 



Time Series Data 



So far we have treated x(*) as a continuous function of time. Now suppose 

a time series is given, {x , n = 0, ±1, ±2, ...}. If we think of x as x(nT) 

n n 

of some function x(*) with a sampling period T, then (4) evaluated as t = nT 



gives us 



X(w) e^ na)T dw. 



Breaking up the interval of integration into segments of length 2 tt/T each, write 



the above as 



m f°° . f(2m+l)TT/T . m m rTT/T 

X = M if X(w)e^ T du) = ^- 

n 2 J_oo T j (2m-l) 7T/T 27r J-TT/T 



X* (w) e^ na)T da) 
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where the auxiliary function, X* (w) , is introduced by 

00 

X* (to) = i £ X(tof2Fm/T). 

_oo 

If |x(to) | =0 for | to| :> tt/T, then X* (to) = ~ X(to). Otherwise the values of X at 
to + 2'n'm/T, m^O also contribute to X* (to) . (This is known as aliasing.) 

Aside from changes of variables (6) corresponds to (2) . Then to (1) cor- 
responds the expression 

(7) X*(oo) = Z x e- jUnT . 

_oo n 

This is the discrete version of the Fourier transform (DFT) . Note that it is 

the z-transform of {x } evaluated at z = e" 1 ^. The z- transforms are discussed 
n . 

in the next appendix. 



Finite Data 



Suppose that we know x(*) only over [0, NT] for some NT, and construct 
x NT (*) to be the periodic function with period NT by repeatedly copying x(*) 
over (- co , °°) . Thus, the Fourier series expansion 

00 

■WO - l 

exists where 



(8) 



i f NT 

C k (NT) = NT x(t)e 

n 



-j27Tkt/NT 



dt. 



Define X(2TTk/NT) by NTc^(NT). The original time function is constructed as 



in (5') = 
(9) 



NT 



(t) = NT f X(2irk/NT)e j2Trkt/N,r . 

j —CO 



To (3) corresponds the Fourier transform with a finite time interval: 
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( 10 ) 



X (03) 



•NT 

•'0 



x(t)e“ ja)t dt. 



Since x(t) = (t) for t <_ NT, substitute (9) into (10) and evaluate the 



integral 



X (to) 



r NT 

J n 



^ I X( 2m c/NT)e:« NT 



-jwt^ 
e J dt 



E X(27Tk/NT) 



NT 



NT 



-j (tO-2TTk/NT) t 
e dt 



Z X (2TTk/NT) e 



. (ONT 

-jTTNT/2 Sin 2 



(CONT/2-TTh) ' 

where the change of j* with E is assumed to be legitimate. 

With a sampled data (7) holds. With a finite data set (7) is replaced by 
^ N-l 



(11) X*(w) = Z x e 

0 n 

where x = x(nT) . 
n 



-jconT 



Define X(k) by 



( 12 ) 



X (k) = X*(2TTk/NT) 
N-l 



= E x e 
0 n 



-j2irnk/N 



This is DFT with a finite data. 

N-l 

To recover x from X(k) sequence, consider 77 E X (k) e ^ 2Trnk / N . when (12) 
n * N Q 

is substituted into X (k) , we obtain 
N-l N-l 

1 ^ e -j2TTmk/N e j27Tnk/N 

N n m 

k=0 m=0 



N-l N-l . „ , , \ ... 

« 1 „ i2uk (n-m) /N 

E x — E e 

n m N _ 
m=0 k=0 
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Using the identity 

1 N y 1 j2irmk/N = f 1 ' m = 0 

N 8 1 

k=0 [ 0, m / 0, 

the above reduces to x , i.e., we have established 
n 

N-l _ , , 

1 v , 3 2TTnk/N 

X n = i l X(k)e 

Now suppose {x^} is a mean- zero weakly stationary stochastic sequence, the 
variance of (11) defines 

s ( 10 ) = lin jjj- E I X* ( 00 ) | 2 

CO 

v t, -jwkT 

= £ V 



as the (power) spectrum of the time series {x }, where R, = E (x x') is the 

n k n+k n 



covariance . 
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A. 5 The z-transform 



( 1 ) 



The z-transform of a sequence (x } is formally defined by 

CO 

/ \ v ■— n 

X (z) = I. x z 



The one-sided z-transform is defined by X(z) = ^ n _Q x n z r implicitly 
assuming that we are interested only in that part of the sequence {x^} beyond 
the initial time n = 0. A way to recover x^ from X(z) is to calculate 

(2) 2^"l| z | =1 X(Z)k " ldZ 

where the integral is carried out around the unit disc, |z| = 1. 

A typical one-sided z-transform arises in characterizing dynamic 
(impulse) responses of linear systems. Although dynamic systems are described 
or characterized in many ways, one common way is to give a dynamic system's 
impulse response functions or sequence i.e., dynamic multiplier sequences, 
because then the (zero-state) response to any other input (exogenous) 
sequences is describable by 
y (z) = H (z) U (z) 



where 

„co 

H (z) = l Q luz 

is the z-transform of the impulse responses. 

Dynamic systems with rational transfer functions are stable if their 
poles are located in |z| < 1. An example of two-sided z- transforms is the 
covariance generating function of a weakly stationary stochastic time series. 
We return to these topics later. 

This definition shows that the operation of forming z-transforms is 
linear. The z-transform of a sequence, {ax^ + by^} , made up of a sum of 
scalar multiples of two other sequences {x^} and {y^} equals aX(z) + bY(z), 
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where X(z) and Y (z) are the z-transforms of these two sequences, respectively. 

Z- transforms can be discussed on at least two levels. On one level, 

the variable z ^ merely serves as a place marker in a representation of a 

-7 

sequence. For example, z is associated with x^ r and serves to single 

out x ^ from X(z) . This is certainly convenient when as infinite sum are 

-1 -2 

formally put in a closed form as when 1+z +z +... is represented 

by 1/(1 - z) . The role of z or z ^ as place markers is evident in the 

definitions of generating functions in statistics, probability and other 

disciplines. On this level, we do not worry about the convergence of the 

formal series associated with the z~ transforms. Infinite sequences are 

merely conveniently and compactly represented as formal power series. For 

example, this view is useful in relating two series that are defined by 

convolution: 

c . = £ . a . .b . 
i 3 1 “3 3 

because the z-transfrom of {c_.}, which equals A(z)B(z) where A(z) and B(z) 
are the z-transforms of {a^} and {b_.} respectively, can be used to recover 



Equation (1) shows that z ^X(z) corresponds to a sequence {y^} where 

y = x _ because of the relation 
n n-1 

z -1 X(z)z n_1 = x(z)z n ” 2 



in the integrand of (2) . In other words, the multiplication by z ^ is a 
backward shift operation z ^x 



x^ The lag operation L in econometrics 
-1 



is the same as multiplication by z . The same holds for one-sided z-transforms. 



The z-transforms of the sequence {h } , where h = Lx = x „ , then 

n n n n-1 



is constructed by 
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H (z) = E h z 
„ n 
n=0 



r-i *“3fl . "*1 

L X Z = X ., + Z L X Z 

„ n-1 -1 - m 

n=0 m=0 



x + z X (z) . 



If x _ is zero, then H(z) = z X(z). Let f = x . Then F(z) = E f z 
-1 n n+1 u n 



= Z n=0 x n+l Z 



-(n+1) 



= zX(z) - zx. These two cases show 



n=0 n+1 

that multiplication by z corresponds to forward shift and multiplication 
by z ^ means backward shift in the time domain. 

Using one-sided z-transforms we can solve difference equations by 
converting them into algebraic ones just as Laplace transforms allow us to 

solve differential equations algebraically. 

co "™xi t “i 3 3 2 

Example Given y(z) = E Q y^z , ly } produces zy(z) " 2 y Q ■ z y^ - 

3 

z y^ as its z-transform. 

Example The zero- initial condition solution of y. , + a _y. , _ + ... 

- — k+n n-1 k+n-1 

+ a^y, = y, has its z-transform y(z) = G(z)/[z n + a + z 11 ^+ . . . 
0 k k n 

+ , where G(z) and y(k) are the (one-sided) z-transforms of 

{g^} and {y^} respectively. 

Example Let x t+ ^ = + bu^ 

y t = cz t + du t . 



Then 



zX(z) = AX(z) + bU(z) + zXq 
y(z) = cX(z) + dU(z). 



Hence 

y(z) = {c(zl - A) ^b + d}u(z) + zc(zl - A) ^x 
where the first part of the z-transform of the zero-state response, and the 
second is that of zero- input response. 

On the second and more sophisticated level, z-transforms are treated as 
defining analytic functions in some region of the complex plane. In some 
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cases, the formal power series of z do converge in some region of the complex 



plane thus defining analytic functions. If the infinite series converge 
jU)T 



for z 



X (e 



jo>T 



with some T, then the z- transform evaluated at this z value, 
, is the Fourier transform of a sampled sequence of a continuous 



function of time with sampling interval T. 

By identifying z with e^ WT , we recognize (2) as the formula for the 



coefficient of the Fourier series expansion 
T f n/T 

X k 2ir I . 

J -TT/T 

The z-transforms are thus related to the Fourier transforms when some 



X(e jUT )e jt0kT d(0. 



specific values are assigned to z. To see this we proceed as follows. 

*\t 

Suppose that x(t) has the Fourier transform X(oo) where 



'U 

x (co ) = 



x(t)e ^^dt. 



Its inverse transform is 
x (t ) 



,oo 

= 1 
2tt 

^ — o 



X(co) e^ WT d oo. 



So the value of x ( • ) sampled periodically with a time interval T is given by 



x(nT) = 



2tt 



r c 

— c 



X (a)) e^ na)T do). 



Dividing the interval of integration into segments of length T each, let 



us rewrite the above as 
T 



, r 2m+l)TT/T m 

, mN * 1 (V 3WnT- 

x(nT) = Z — X(o))e J da). 

2t r T 



(2m-l) tt/T 

If we change the variable of integration from to to co 1 = go - 27rm/T, we can 



rewrite the above as 
T 



x(nT) 



where we use e 



2t r 
j 27Tmn 



00 if ^ % ioj'nT 

Z ^ X (co 1 + 27?m/T)e J do)' , 

-tt/t 



1. 



Suppose the expression 

oo 

X*(0)) = “ Z X(0) + 2TTm/T) 
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is well defined. 



Then the value of x(') at t = nT can be related to X* ((a)) by 
T f 7T//T jojnT 

(3) x (nT) = X*(U))e J n dU). 

z J -tt/T 

The function X* (W) is periodic with period 2TT/T because 



X*(W + 2TT/T) 



i E a 



X(0) + 2TT (m + 1)/T) = X*(0)). 



A periodic function of t with period Q can be represented by a Fourier series 



c(t) = 2 



-j2TTnt/n 



where integration of both sides from -Q/2 to 0>/2 after multiplying both 
sides by e 3 2Trmt /^ yields 

^/2 ^TTW./O OD f ^/2 



-&/2 



j2TTmt/Q 
x(t)e J dt 



j2T(m - n) t 



dt 



a 



- 0./2 



where we use 

57 f 



fi /2 

-Q/2 



e j2Trnt/fi dt 



1, n = 0 
0, n f 0* 



Thus 



1 

C m Q 



tt /2 

-&/ 2 



j2TTmt/^ 
x(t)e dt. 



Now foihrally represent X* (a>) by a Fourier series. Since X* (a)) is periodic 



with period 2TT/T, its Fourier coefficient is 

... £- i V2 

—TT/T — TF/2 

Now compare this with (1) to see that x(nt) is the n-th Fourier coefficient 

of X* ( 0 )) , i.e. , 



X*«o)e jnWT dO) = x(mT). 



X*(w) = x(nt)e jn “ T 

is the Fourier series of X* (U)) . 

Define a function X(z) by 

0° — n 
X (z) = 2 ^ x (nt) z 

We recognize X(z) as the z- transform of x(*). 



Then 
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X* (oa) = X{e^ T ). 

To recover x(nt) form x(z) , set z to e^ 7 and note that do) = dz to 

jT 

rewrite (2) as 

x(nT) = I X(z)z n_ 1 dz. 

2 ^|z|il 

This is the inverse z- transform given as (2) . 

Fourier series are also defined for functions defined on the closed 

interval [-tt, tt 3 . The Fourier coefficients are defined by 
if 71 

C n = 2tt f(x)e JTiX dx r n = 0, ±1, . .., 

J - TT 

if the integral exists. We follow Hoffman (1962) to characterize a class 

2 

of analytic functions: Let H denote the class of analytic functions f 

in | z | <_ 1 for which the functions f (0) = f(re^) is bounded in L 2 -norm 

as r -»• 1, i.e., A f 2 = M I ^ ( re ^^) | 2 d0) remains bounded as r -*■ 1. 

2 J “ ir 2 
The space H then is identified with a closed subspace L of the 

circle: 

2 2 f 17 ‘ fi 

H = { f^L : I f ( 0 ) e in d0 = 0 , n=l, 2, in other words , 

2 ^ 

the element f €= H has a one-sided z-transform f(z) = a z n , where z = 

0 n 

__ • A 

e , because the Fourier coefficients vanish for negative integers. The 
shift operator T can be defined on = a space of square summable sequences 
of complex numbers by 

T( V a 2/ a 2' *•* ^ ~ (O' a^, a^, ... ) 

where 

^ I a x | 2 < 00 

2 

or on H by 

(Tf) (0) = e 10 f(0) (= z _1 f(z)). 
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A. 6 Some Useful Relations for Quadratic Forms 



Here we collect some useful formulas involving quadratic forms. Most of 
results are found in Bellman [1960] but are collected here for easy reference. 

1. Completion of square can be used to find the minimum and the minimizing 
expression in 



Min (a + 2b' x + x'Qx) = a - r'Qr 
where Q' = Q > 0, and x = -Q ^r. 

This follows by writing 

a+2b'x + x'Qx = a+ (x + Q "Sr) 'Q(x + Q^r) - r'Qr. 

2. The extrema of x'Qx subject to x'x = 1 is related by 



\nin (Q) £*'Q*/x'x <A max ( 2), 

or Min x'x subject to x'Qx = 1 may analogously be phrased. 



3. The minimum of x'Qx subject to a linear constraint Ax = z where A is an 
m x n matrix of rank n is achieved by x = Q _1 A* (AQ _1 A' ) _1 z. 

4. The matrix solution of the linear differential equation 



X = AX + XB , X(0) 

is given by X(t) = e At Ce Bt . 

CO g-f- 

5. If X = - / g e Ce dt exists for all C, then it is a unique solution of 
AX + XB = C. To see this, consider Z = AZ +ZB, Z (0) = C. Assuming that Z(t) + 
0, as t 00 , integrate the differential equation to see that 

CO OO 

-c = A(/ 0 Z(s)ds) + (/ Q Z ( s ) ds ) B . 

6. Using the kronecher delta notation, the matrix equation AX + XB = C is 
converted into (A © I + I & B')vec X = vec C where A ® B = (a_B) . The matrix 
A Qp B has eigenvalues A^y^ where A^ is the i-th eigenvalue of A can y_. is the 
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j-th eigenvalue of B. 

The matrix equation AX + XB = C has a solution for all C if and only if 
the eigenvalues of A do not cancel each other out, i.e., A^ + \i ^ / 0 for all 
i and j. The uniqueness follows from linearity. 

7. The algebraic equation of the form X - AXA' = Q where Q' = Q > 0 arises 
in several context. It is equivalent to (I - A x A)vec X = vec Q. The matrix 
I - A x A has eigenvalues 1 - A^A^ hence I - A x A is nonsingular if and only 
if A^Aj / 1 for all .i, j. The condition |X(A) | < 1 is sufficient. 

8. Lyapunov theorem 

An algebraic equation for a symmetric matrix 
(1) A'XA - X = -R 



where R' = R > 0 arises in may context. We call the matrix A stable if all 
its eigenvalues have modulus less than one. The matrix X is clearly symmetric. 
First, the solution matrix X is unique if A is stable. Suppose there are two 
solutions. The difference X^ - X 2 obeys A' (X^ - X 2 )A = (X^ - X 2 ) . To consider 
a simple case, suppose A has distinct eigenvalue, Av = Av. Multiply the above 
by v from right, we note that 

Aa' (X ± - X 2 )v = (X ± - X 2 )v, or if X ± - X 2 / 0, then (X 1 - X^sr 
is an eigenvector of A" with eigenvalue 1/A. However, | 1/A | > 1 if | A | < 1 
contradicting the assumed stability of A, hence X^ = X 2 - The solution is 
unique . 

By iterating (1) , the solution X may be written as the sum of an infinite 
series 

(2) X = R + A’RA + (A') 2 RA 2 + 

This is well defined because A is stable. To see this we need only to note 
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OO OO 

v'{E (A') h RA h }v = (v'Rv) S I A ! 2h = (v'Rv) / < 1— | A | 2 J < °° 
h=0 0 

for any eigenvector v of A. This argument establishes finiteness of (2) when 
A has distinct eigenvalue and that X is positive definite. Even with some 
repeated eigenvalues, the infinite sum can be shown to be bounded and X be 
positive definite. 

The converse also true because v' (A'XA - X)v = ( | X | ^ - l)vXv = - v'Rv < 0 
implies that |X| < 1, i.e., A is stable. 

These are summarized as Lyapunov Theorem. The matrix A is stable it and 
only if (1) has a unique symetric positive definite solution. 

_oo 

9. The integral J = / Q (X'BX)dt when evaluated along a solution of x' = Ax 

o° t At 

equals -x(0) 'Yx(O) , where A'Y + YA = B or Y = - f Q e Be dt. This can be 
seen by integrating ^(x'Yx) = x'Bx. 
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A. 7 Calculation of the Inverse, (zl - A) 



-1 



A recursive procedure for calculating (zl - A) ^ is available: See 

Aoki [1976; p.45] . 

. -1 1 r n-1 _ n-2 _ i 

(ZI - A) = d^ [Z +B ' Z + ’ ’ " + B n-1 



where 



and 



_ . , i _ i n n-1 

d(z) = zl - A = z + a _z + ... + a. 

1 1 n-1 0 



B_ = A + a ,1 
1 n-1 



B n = AB n , + a n I l = 2 ... n-1 
36 £-1 n-36 

0 = AB + a. I. 

n-1 0 

When this algorithm is applied to a single-input-single-output system (A, b, 

c) in the phase canonical form we can readily establish that BJd = e^ i = 

1 ... n-1. Hence we can write the transfer function as 

c (zl - A) = (c p z n ^ + ... + c )/d(z). 

n-36 0 
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A. 8 Sensitivity Analysis of Optimal Solutions : Scalar-Valued Case 

Asymptotic behavior of optimally controlled system state vector is 
the same, i.e., the state vector approaches zero for any choice of weighting 
matrices Q and R provided than the dynamics are controllable and (A, H) is 
detectable, where Q = H'H. The latter condition depends on which components 
of the state vector are actually included in the performance indices. 

Dynamic behavior i.e., the manner of approaching zero, however, are 
influenced by our choice of Q and R. The transient response of the dynamics 
with the optimal control is determined by the eigenvalues of the matrix 
(A - BK*) where K* = R ^B'p* and P* is the solution of the matrix Riccati 
equation. 

We now conduct a kind of root- locus analysis for a special class of 
dynamics. See Aoki [1976, 1981] for some description on the root-locus 
method. We limit ourselves to problems with scalar-valued decision variables 
and scalar- valued data and correspondingly specialize R to a scalar r and 
B to a vector b, and Q = hh' where h is a column vector. We follow Kailath 
[1980] in our development. The transfer function between u and y = h'x is 
then 

(1) h' (si - A) _1 b = n(s)/d(s). 

Here d(s) = |sl - a| is the characteristic polynomial of A and where we 
assume that d and n have no common factors. (This follows if the dynamics 
are controllable and observable as we have assumed.) 

The Riccati equation becomes A'P + PA - Pbb'P/r + hh' =0, and the 
optimal feedback gain -b'p/r. First, rearrange the Riccati equation by 
adding and subtracting sP , where s is the Laplace transform variable, to read 
P(sl - A) + (-si - A')P + Pr^b'P = hh'. 
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Multiply the above from left by b* (-si - A 1 ) ^ and from the right by (si - 
A) ^b to rewrite the above as 

(*) b* (-si - A*)” 3 * + k' (si - A) "" 1 b + b* (-si - A* ) ”’ 1 kk* (si - A)b 

= r ^b* (-si - A*) Sih' (si - A) ^b. 

Let the characteristic oolvnomial of the closed loop system |sl - A + bk*| 
be written as 

d^,(s) = d(s){l + k* (si - A) ^b} 

where we use the identity |l + cd' | = 1 + d'c. See Aoki [1976; Appendix B] . 
The zeros of this polynomial are the eigenvalues of the dynamics with feed- 
back gain vector k. 

Then (*) is used to simplify the expression 

d k (-s)d^(s) = d(-s)d(s){l + r ho' (-si - A*) 1 hh' (si - A) "Sd} 

= d(-s)d(s) + r \i(s)n(-s) 

where we use b' (-si - A') "Hi = h' (-si - A) ^b = n(-s) . 

Next we claim that the eigenvalues of (A - br ^b'P) which determine 
the transient response of the optimally controlled dynamics are the n 
stable roots of the polynomial d^(-s)d^.(s), where n is the degree of the 
polynomial d(s) , i.e., the parametric dependence of eigenvalues or r is 
exhibited by* 

(2) d(-s)d(s) + r 1 n(-s)n(s) = 0. 

This gives a generalized sort of the root- locus plot for eigenvalues. For . 
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large r, i. e, with heavy cost of control, the transient responses are 
governed by eigenvalues which are approximately equal to the roots of d(s) 

= 0, i.e. , the controlled system eigenvalues are near those of the uncotrolled 
system and the system behaves almost as the uncontrolled dynamics itself. 

With r approaching zero, q of the eigenvalues are given by the roots of 
n(s) = 0 



where q — deg n. These are the zeros of the uncontrolled system transfer 
function (1) . The coefficients of the polynomial n(*) can be recursively 
determined by Leverrier's method (Aoki [1976; p.45]), for example. The 
remaining (n - m) eigenvalues approaches 00 along some asymptotes. These 
can be determined by retaining the largest term in |s| in (2): 

(-1) s + (-1) r n Q s = 0 
where n Q is the coefficient of s m in n(s) . 

The (n - m) asymptotes that lie in the left half s-place are thus (n - m) 



stable branches of 



_ n-m-1. 2 . . 

= (-1) (n Q /r) 



2 (n-m) 



and 



-»• oo # 
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A. 9 Common Factor in AJRMA Model and Controllability 



This appendix establishes a connection between controllability and 
presence of common factors in ARMA models. To this end, let s be a scalar, 
and b, c, and d be vectors. 

A dynamic system represented as 



(S)< 



t+1 



Ax^_ + b £^_ 



y = cx^ + d£ 
r t t 



has the transfer function 



I(z) = d + c(zl - A) *4 d = ip(z)/^)(z) 



where 



a.nd 



$(z) 



♦ (2 



zl 



ZI - A 



hence can be put into an ARMA model form 

<j>(L)y t = ML) e t . 

In the above, a matrix identity* 



I O'! j'zl - A b 

4 -l 

-c 



zl - A 
0 



b 

H ( z) 



v c(zl - A) * 1 J v -c d^ 
and the corresponding determinantial equality are used to show that (j)(z) *H(z) 

equals ^(z) . 

We now show that if $(•) and ^(*) has a common factor then (S) is either 
not controllable or observable. Suppose <j>(z^) = 0 and ip(z^) = 0 where z^ 
is one of the eigenvalues of A. (From $(z) = |zl - a|, the roots of (j>(«) 
are all eigenvalues of A.) Vanishing ip(z^) implies that there exists a 
vector (£, rj) not identically zero such that 



Or more directly, use the matrix identity 
fzl - A b 
-c d 



zl - A | 8 (d + c (zl - A) ^b) . 
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zl - A b'k /-£/ 

-c d' W 



0 . 



If ri is zero, then £ 7 * 0 and (z^I - A) E, = 0 and c£ = 0 or [o', A'c', 

(A* ) n 1 c) = 0, with £ t* 0. This means that the system (S) is not observable. 
If r| 7 * 0, then (z^I - A) £ + hr) = 0. Let cj)(z) = (z - z^)(j)(z). On multiplying 
$(z) from the left, this equation becomes ^(z)bri = 0 or because (j) (A) vanishes 
by the Cayley-Hamilton theorem, we have <j)(A)b = 0, i.e., (A, b) is not a 
controllable pair. 

The converse is also straigtforward to show. (See Kailath [1980] or 
Chen [1970] for example.) 
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A. 10 Non-Controllability and Singular Probability Distribution 



If a Markovian model generating a time series {y^} is not controllable, 
then the probability distribution of y becomes singular. To see this simply, 
let a p-dimensional y^_ be generated by a stable dynamics 



y t+ i = Ay t + be t + r 



y o = 0 



where dim[b, Ab, ..., A^ ^b] < p, and {e } is the usual mean-zero white 
noise sequence. 

By this non-controllability assumption, there is a p-vector a such 
that a 1 [b, Ab, ..., A^ ho] = 0. Becuase y^_ = E^_QA S be t by the dynamics, 
we note that E{a'y t £ t _ s } = 0 for a H s >_ 0. By the Cayley-Hamilton theorem 
a’C = 0 for € = [b, Ab, ..., ]. 

Thus, a*y t equals zero in the mean square sense. The distribution of 
y's is thus confined to some subspace in the space of all mean-zero, finite 
variance random variables. This shows up as the rank of the (sample) co- 
variance matrix of {y^_} being less than the dimension of the vector y. 

A non-controllable dynamics contain a controllable subsystem by an 
appropriate partition of the vector y. This subvector may also be identified 
by a suitable partition of the covariance matrix of y. Let £ = cov(y) and 
denote by F and A the matrices of eigenvectors and eigen values respectively; 
Zr = FA where A = diagd^, — , X , 0 — 0) , r with q > p. 

The characteristic function for y is E(e^ y ) = exp - ~ t'Et. Let 

it * v i0 1 r 1 v i0 1 v 

t = T0 and denote F'y by v. Then E(e y ) = E(e y ) = E(e u ) = 



exp - \ ©'F'Zr© = e 2 



0 ' A0 



exp 



E? _ X . 0? , showing that V_ , . . . , V 
2 i=l ii 1 q 



are independently distributed and = 0, j = q+1, ..., p with probability 
1 because their characteristic function is 1. (See Aoki: [1967; Appendix III).) 
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A. 11 Spectral Decomposition Representation 



Suppose an n x n matrix A has n linearly independent vectors {u^} and 
n (not necessarily distinct) eigenvalues counting multiplicities. For 
example A = has (q) and (^) as two linearly independent eigenvectors 

with eigenvalues = A^ = 2. 

Let U = [u' , . . . , u ] and V = U ^ and write the row vectors of V as 
1 n 

v^, i = 1, . .., n. The vector u^ is the right eigenvector while v^ is 
called the left eigenvector of A_^ because v^A = A^vl . 

By definition 



AU = uA 

where A = diag(A^, . . . , A ) . 

Thus we obtain the sepctral decomposition of the matrix A 

A = uAv = £*? _A.u.v! . 

i=l ill 



This representation is useful in evaluating dynamic effects because it 
effectively represents dynamics as a parallel array of n scalar dynamics. 
To illustrate, note that 



At y n i , 
e = l . _ e u. v . 

i=l l i 

Therefore the effect of a scalar exogenous variable u on y where 

y = c'x 

x = Ax + bn, x (0) = 0 



can be decomposed into n components 

A . t 

y (t) 



A.t ft -A.t 

= £? e 1 (c*u.)(v!b) e 1 u(T)dx, 

1-1 1 1 Q 



showing that if c is orthogonal to u. , then y(t) does not contain a component 
Ait 1 

proportional to e (i.e., the i-th mode is not observed by y) and that if 
vjb is zero u does not influence the i-th mode, i.e. , the i-th mode is not 
controllable . 



For other illustration of such model decompostion see Aoki [1964 & 1968] . 
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A . 12 Singular Value Decomposition Theorem 

Effects of small perturbations are essential parts of many analysis 

related to stability, optimality and parameter sensitivity. Dynamic systems 

are assessed for their structural stability and parameter sensitivity as 

well as variational dynamics are used to study "neighboring" time paths of 

solutions. Solutions of algebraic equations (overdetermined or otherwise) 

are incomplete unless some "condition numbers" are calculated to indicate 

degree of robustness of solutions or ill-poseduen of problem formulation. 

Similarly, computational error analysis is a must in evaluating algorithms 

in numerical analysis. In statistics, principal component analysis, canonical 

correlation analysis and the like exist to perform similar functions. 

Here we examine singular value decomposition as a tool for unifying 

sensitivity analysis in some time series analysis, as well as a practical 

way for determining ranks of numerically determined matrices. The fact 

that any (m x n) matrix A is expressible as A = UEV* where U*U = I V*V = I 

m n 

and E- = diag(E^, 0), E^ = diagCa^ — , a^) where r = rank A, is known as 
the singular value decomposition theorem. (See Strang [1973] or Golub and 
Reinsch [1970] for example. The proof is summarized in Appendix. Using 
this decomposition we can easily establish some properties of rectangular 
matrices (see Appendix for proof) 

(i) A* = VE'U* 

(ii) A*A = VE'EV*, AA* = UEE'U* 

(iii) Let A, B be (m x n) matrices. Then 

Ha - bII = max II (A - B)xil / II xll 
x^O 



= The largest singular value of (A - B) . 
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(iv) A + = VE + U* where E + = diag(E r \ 0) 

is the Moore-Penrose pseudo-inverse. (See Aoki [1967], Appendix II also.) 

(v) The condition number of A is G./G . 

1 n 

(vi) Let A be n x n with eigenvalues X^, ..., X^ arranged in the order 
of decreasing magnitude. Then _> |X^| >_ G^ and cond(A) _> |X^/X n |. 

An easy application is to the sensitivity analysis of solutions of 

algebraic equations. Let Ax = b where A is n x n. A slight error in 
specifying b produces an error in x : A(x + Ax) = b + Ab or aAx = Ab. Using 
the singular value decomposition A = uEv*, define Ab = uAb and Ax = vAx. 

^\j ^\j 

Then O.x. = b. and g.Ax. = Ab. From G < Hbll/Hxll < G_ and G < II Abli /II Axi 
xii ii n — — 1 n — 

<_ G^, we can bound 

(cond (A ) ) -1 <_ ll - Axll/i ' - X - 11 < cond(A) ■ 

Ab/ll bll 



Suppose now that A is m x n where rank (A) = r. The singular value 

r \j 

decomposition with b = Ub and y = Vx shows that 



'Xj 

O.x. = b. 

ii l 



1, r 



and 



O/ 

0 = b. 



i = r+1, . . . , n. 



r \j l \> 'Xj 

The solution, then, x^ = b_. /G^, 1 1" < r and x_^ is undetermined for 0, 

i > r+1. 



_ n . || 2 v | cy ^|2 v m | Cy 1 2 v m i y 1 2 ^ , 

From II b - Ax" =E, _b. -G.x. + E _ h . >E _ h . , x . = b ./O . 

i=l 1 i i i 1 r+1 1 i 1 — r+1 1 i 1 i ii 



is the least square solution and x^ = 0, i >_ r+1 produces the minimum norm 



solution, i.e., x = A b. 
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A. 13 Hankel Matrices 

Here, we cite several problems in which Hankel matrices appear as a part 
of the problem descriptions or solutions. 

A deterministic counterpart of the prediction problem is to calculate future 
values from past input sequences i.e., assuming no more inputs u . ..,. We 
note that for the model (2) of Chapter 7 



• y N+ l 




r C 


y 




CA 


J t+2 


= 


. 




* N-l 


y , „ 




CA 


t+NJ 







XN+1 



where is related to the current and past inputs with zero initial conditions by 



X N+ 1 = IB, AB,..., A B] 



"N 

Vi 



Together, future observations y. tl , ..., y is related to the current and past 

t+i t+N 



u's, u Q , u x ... Ujj, by 



Y t + 1 








= u 






n 




Y t+N 




L U 0. 



The same matrix CA 1 as in (4) of Chapter 7 appear when the transfer matrix 
C(zl-A) is expanded into Laurent series 

C(zI-A)~ 1 B = CB + CAB/z + CA 2 B/z 2 ... 



222 



Identifiability 



Hankel matrices with auto- correlation coefficient as elements arise in some 
identification problems. The next ARMA model illustrates. Consider a scalar 
and related by 

(*) {l+a(L)}y t = (l+3(L)}e t 

where 

p q 

a(L) = E a.L 1 and 3(L) = £ B.L 1 , q £ p. 

i=l 1 i=l 1 



The unknown parameters are collected into a vector 0, 0 = (04 , ..., a , 

J. P 



■v •••' 



)*. The output auto- correlation is denoted by R. (9) where E (y y 
2 x r r- 

R^(0) . Now calculate the covariances of (*) with f° r j P to obtain 

1, . . . , p. 



[R T R X+1' • ' ‘ ' R p+T ] 



-Vi' T 



These equations can be arranged as 

H p ( 9 ) a p ( 6 ) = r p (9) 



where 



and 



'r l (0) , . 


CD 

Pi 




a 

P 






P 

CD 

II 








P 


. 


CD 


R 2p _ l( e) 




“i. 



y 0) = ( Vi (e) v 0)) ‘- 

The matrix H (0) is a Hankel matrix. The parameters 3-,/ ..., 3 , q 5. P/ 
p q 



are related to a (0) by another matrix which is a Toeplitz matrix 
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'0 ’ V • 


. . , R 

P 


0 
PS 

1 — 1 

1 


.. , R 1 
P-1 


W **** 


' * ' R p+q+l 



We note from (*) 

y . + a,y a y . = Be . + $, e^ 3 e. , . . 

^t+p+u 1 u+p+}-l P t+3 0 t+p+3 1 t+p+D-1 p t+p3 

Hence for j >_ 0, the right hand side is independent of £ fc , £ t-l r ***'* Therefore, 
the conditional predictions of future observations satisfy a linear relation 

Y t+p+j|t-l + a i y t+p+j-l(t+l) + "• + a p y t+j(t-l) ° - °' 

showing that the space of predictors related by the Hankel matrix or the rows of 
tt as in the first example of Hankel matrices eventually become dependent on pre- 
vious rows. Actually p is the smallest integer such that Y t+ p|-j-_2. l^ near ^y 
depended on its predecessors Y t+ j_ 1 i = 0, p-1. This observation is im- 

portant because it generalizes to a vector-valued process and gives a constructive 
procedure for Markov models. 

Hankel matrices arise in yet another way in approximating a given impulse 
response sequence by that of a dynamics with a rational transfer function. Al- 
though the next example is not the way lower-order models are constructed (this 
method suffers from numerical instability) , its simplicity conveys the idea of 
approximation well. 



Example — Pade Approximation of a Transfer Function 

Suppose a rational approximation b(z)/a(z) is desired to a given impulse 

sequence {h.} 1 ? ^ where a(z) = 1 + a_z ^ +, ..., + az n and b(z) = b^ + b., z ^ + , 
1 i=u 1 n 0 1 

..., + b^z n . One way is to choose a's and b's so that the first (n+1) elements 
of 
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H(z) = b (z)/a(z) = h. + h n z ^ + , . . . , + h z n + 

0 1 n 

exactly match the given sequence. This approximation H(z) to (h^} is known as 

_ . _ . . . . _ - (n+1) - (n+2) -2n 

Pade approximation. Equating the coefficient of z , z , . . . , z 

respectively, a* s must satisfy 




where the Hankel matrix again appears and 




Phase-Canonical Transformation 

A somewhat more technical use of a Hankel matrix arises in transforming 
a dynamic (controllable) system into a phase canonical form. 

Consider a single- input- single- output Markov model 

Xt+1 = A X t + bu t 

y t = c *t 

where 

A = J - e a' , a* = (a , a , , a ) , b* = e' = (0.. .01) , 

n u 1 n n 

c = (c 0 , c 1( ... c n _ r 0...0) 

with 

0 1 0 ... 0 
0 0 1 ... 0 

0 0 1 

0 0 
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The matrix J is a shift matrix. For example Je = e _ , where e _=(()..• 

^ n n-1 n-1 

010) , or Ja = (a.. , . . . , a _ , 0) . 

l n-1 

A dynamic system 



? t+l = FS t + ®"t 



Y t - d? t 

Can be put into the phase canonical form by a nonsingular transformation T, i.e 
A = T _1 FT, b = T _1 g, c = dT 

where 

[g, Pg, F n-1 g] = TH -1 



where 

H _1 = [b, Ab, A n_1 b] 

or the required transformation is 

T = [g, Pg, F n_1 g]H. 

This matrix H turns out to be a Hankel matrix 



H 




n-1 



1 0 



0 



a 



n-1 



a 



n-1 



1 



1 0 



0 



0 



Toeplitz matrices 

Hankel matrices are related to Toeplitz matrices by simple transformations 
Listing the columns in reverse order of a Toeplitz matrix 
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produces a Hankel matrix 



T = 



*0 R -1 ••• R -p+l 



■p-1 



R o > 



' -P+1 



•• R o 



P-1 J 



Let J denote a p by p matrix with ones along the counter- diagonal line and zero 
everywhere else, 

0 1 
1 0 

0 CK 




2 

This matrix is symmetric and idempotent, i.e., J = I. Using J the above opera- 
tion can be expressed as H = TJ. 

Rearrangement of the rows of T in the reverse order also results in a dif- 
ferent Hankel matrix. This matrix is related to T by JT: 



H = 
r 

Clearly, Toeplitz matrices can be obtained by pre- or post-multiplication of a 
given Hankel matrix by the matrix J as well. By calculating the singular value 
decomposition of T and H, we see that both matrices have the same singular values. 

Elsewhere in this lecture notes of a time series, we calculate sample co- 
variance matrices and arrange them to form a Hankel matrix 



p-1 



-p+1 
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H = 



R 1 R 2 



R 2 R 3 



,R 

^ P 



P+1 



2p 



Reversing the order of the columns results in the Toeplitz matrix 

. R_ 



T = 
c 



R R . 
p p-1 






R 



' 2p p 

which is the south-west corner p by p submatrix of the 2p by 2p covariance matrix 

R 



T(2p) = 



R 0 R 1 



P-l 



2p-l 



-2p+l 



Reversing the rows of H yields the north-east corner of T(2p) if the time series 
is stationary. 

Thus, a Gram-Schmidt orthogonal ization works with the p by p main principal 
submatrix of T(2p) , and the singular value decomposition of the Hankel matrix 
made up of covariance matrices basically works with the p by p off-diagonal sub- 



matrix. 
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A. 14 Dual Relations* 

Duality concepts or dual relations are important in static optimization 
such as linear, nonlinear programming and dynamic programming. See Aoki [1971; 
Chapt.6] , Whittle [1982; Chapt.16] and numerous other books on linear and non- 
linear programming. 

Here we consider dual relations that arise in intertemporal optimization 
i.e., optimization over time. Comparison of the two sets of recursive relations 
that arise in optimal filtering and regulation of linear dynamic systems or 
the corresponding Ricatti equations reveals a remarkable resemblance. Actually, 
a one-to-one correspondence can be established for various terms in the recur- 
sions for these two classes of optimization problems, as we shortly demonstrate. 
Calculations of Kalman filter gains, and conditional variances of the estima- 
tion errors are dual in this sense to those of feedback gains and the calcula- 
tions of the so-called cost-to-go in the regulator problems. 

To be more precise, recall that we have established elsewhere that the 
minimum of a quadratic cost (so-called cost-to-go) 

J t,N = k f t ‘ y k Vy k + d k W k> 

subject to linear dynamics 

X k+1 = Fx k + Gd k' 

Y k = HX k 

is achieved by a linear reaction rule 

d k = - r k\ 

and the cost is a quadratic fundation of 



This section is based in part on Aoki [1967] . 
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J t,N X t S N-t^t 



r = (T + G' S _G) S _F. 
n-k N-k-1 • N-k-1 

Here the Riccati equation obeys 

Vt = F ’ S N -t-l F + H,VH - F ' S N-t-l G(T + H, N-t-l H)_lG, Vt-l F 

with statistics 



he 0 



u k _ s k° 
V k OR 



The filter gain is given by 



A = AP C' (R + CP C') , 

where the relevant Riccati equation obeys 

P t+1 = AP t A ' + BQB ' “ + CP^r^A’, 

where P^_ is the error covariance matrix 

P t = cov(x t - X t | t _ x ). 

Comparison of the above yields the following correspondence 



Regulation 
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Aside from the fact that time indexing of P, S, A and T move in the oppo- 
site direction i.e. , the indexing of the filtering problem goes forward in 
time but that of regulator is in terms of the time to the end of the planning 

horizon, (this is why S if indexed as S, then t is time to go and 

N-t t /v t t A t 

is the cost-to-go, i.e., the cost incurred in the last t periods of regulation), 
the first three correspondence reveals that the controllability criterion and 
the controllability Grammian is dual to that of the observability criterion 
and the observability Grammian. 
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A. 15 Quadratic Regulation Problem: Continuous Time Systems 

The question of how to contorl or regulate linear dynamic systems to 
minimize quadratic costs or maximize quadratic performance indices has 
been rather fully explored using several alternative approaches. We will 
follow two such strands; one based on a simple matrix relationship as noted 
in Bellman [1960; p.175] , and the other by Dynamic Programing which is also 
developed by Bellman [1957] . 

Suppose that the integral 

(1) J = f x' (t)Cx(t)dt 

. J 0 
where 

x(t) = Fx (t) 

if finite where A and C are constant n x n matrices. One way to represent 
J is to look for a constant matrix P such that 

— — r{x' (t)Px(t)} = -x' (t)Cx(t) 
dt 

because if a (unique) P is found and if x' (t)Px(t) goes to zero as t 
approaches infinity, then integrating the above we can set 

(2) J = x* (0)Px(0) . 

The matrix P satisfying 

F'P + PF = -C 

evidently meets our requirement. 

Now extend the problem from evaluating a quadratic expression such 
as J to minimizing one: 

(3) J = {x 1 (t)Qx(t) + u 1 (t)Ru(t) }dt 

h 

where Q and R are constant matrices Q > 0 and R > 0, 
where the dynamics is now given by 



(4) 



x = Ax + Bu. 
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Suppose we restrict u to be of constant feedback type 

(5) u(t) = -Kx(t) . 

This does not constrain our search for optimal control because the 
optimal control is of this type, as we later show. 

With (5) , (3) and (4) are transformed into 

r 

J = x 1 (t) (Q + K'RK)x(t)dt 
'0 

where 

X (t) = (A - BK)x(t) . 

Our result (2) then states that 
J = x' (0)Px(0) 
where P is a solution of 

(6) (A - BK) ' P + P (A - BK) = -(Q + K'RK). 

The best choice of K, then, must be the one that minimizes J with 
respect to K. Assuming the existence of such K and the corresponding P , 
denote them by K* and P* respectively. Necessarily the first order variation 
of P* at P* is zero in response to a small deviation of K from K* , i.e., 

AP* = 0 in response to a "small" A k ^ 0. 

Taking the variation of (6) yields then 

Ak' (-b'p* + rk*) + (-b*p* + rk*) 'Ak = o. 

Because AK is arbitrary, the necessary condition for K* and P* is RK* = B'P* or 

(7) K* = R^B'P* 

substituting (7) into (6) , the matrix P* corresponding to this optimal gain 
is determined by an algebraic relation 

(8) A'P* + P*A - P*BR _1 B'P* + Q = 0. 

This equation is known as an algebraic Riccati equation. 




233 



Equation (7) can also be shown to be sufficient because of the quadratic 
nature of our criterion function. 

We later show that the existence of P* , based on the assumed finiteness 
of the expression J, is guaranteed if (A, B) is a controllable pair. 
Finiteness of J implies that x'Qx + u'Ru converges to zero as t goes to 
infinity. To ensure that x(t) itself goes to zero, we need an additional 
assumption that (A, H) is an observable pair whose H is a factor such that 
Q = H'H. We discuss this point elsewhere. 



Dynamic Programing Formulation 



With A, B, Q and R constant, the algebraic Riccati eqaution can also 

be derived by application of the principle of optimality as we next demon- 

0 ° 

strate. We know that x^Px 0 = J (x^_Qx t + u^Ru t )dt. Break up the integral 
into two parts; one over 10, A) and the other [A, so that 



RHS = (x^Qx q + u qRu q ) A + J (x^.Qx t + u^Ru^dt. 

Because the problem is time- invariant, we can write the latter as 



A (x t^ X t + u t Ru t )dt 



x'Px 



A* 



The principle of optimality states that 

x o Px o = Min {< x oQ x o + u o Ru o )A + x & Px A + o(A)} 

where ^ 

X A = x o + <Ax o + Bu o )A + o(A) 

or 

x o Px 0 = Min { (x 0 8x 0 + u i Ru 0 )A + x i Px o 

u o 

+ (Ax q + Bu q ) 1 Px q A + x^P(Ax q + Bu q ) A + o (A) } . 
Cancelling the term x^Px^ from both side, dividing by A and letting it 
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Approach zero, we obtain 

0 = Min {XqQx q + u q^ u q + (Ax 0 + bu Q ) 'Px 0 + x^P (Ax Q + Bu Q ) } . 

U 0 

On carrying out the minimization, we discover that 

u = ~r"" 1 B , Px_ . 

0 0 

Because the problem is time- invariant, we know that in general 
_ — 1 - 

u t = -R B Px^_ . 

Substituting the optimal control yeilds the same matrix Riccati equation 
for P given by (8) . 
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A. 16 Maximum Principle: Discrete-Time Dynamics 



A typical problem formulation is to minimize a sum of nonlinear functions 

, V T-1 .0. 

J T = l 0 f k ( W' 



subject to dynamic constraints 

Vi = z k + f k ( \' V- 

where x^ G C E r . 



k = 0, 1, . . . , T-l, 



The set is the set from which decision vectors are to be selected. 
Here we assume that f Q , f are continuously differentiable in both 

arguments. In addition, further constraints may be imposed on the state 
vectors to satisfy some inequality arid equality equations : 



where 



z o 6 z o - z o n z o 



Z’ 0 = {z: q Q (z) < o), 

Z Q = {z: g 0 (z) = o}. 



and 



where 



z t 6 Z L 



z e z m 

T T 



{z: q (z) < 0} 



Z* n z", 

T T 



Z T = ^ Z: q T (z) - °^' 

= {z: V z) = 0K 

We assume that, unless identically zero, 3g Q /3z and 3g T /3z have maximum 
ranks on Z^ and Z^, respectively. Additionally, technical conditions are 
needed to ensure that gradient vectors of active constraints are linearly 



independent (so that Z^_, t — 0, ..., T may have a right kind of conical 
approximations) (see Canon et al. [1970] ) , and to have conical approximations 
for the set U t , t = 0, 1, ..., T-l. 
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With these assumptions we can state the necessary condition for optimal 
decisions which is the discrete dynamics version of the Pontryagin 1 s maximum 
principle. (Canon et al. [1970]) 

Theorem (Discrete-Maximum Principle) 

There exist costate vectors, Pq 7 P]/ P T in E* 1 , multiplier 

vectors A Q , A T , X t £ 0, ]i Q , li T , and a scalar p° £ 0, 

such that (i) not all p°, p , • • • / P T and y Q , are zero, 

(ii) p^ satisfies 

p t - p t+i = x t )/ 3 z] ,p t+i + p°t 3 f ° (z t' x t /3z] ' 

+ [3q t (z*)/3z]'A t , t = 0, 1, T— 1 , 

(iii) subject to the transversality conditions 

p 0 = - [ 3 V z o )/8z] ’ p o 

P T = [3g T (z T )/3z]'y 0 + [3q T (z T )/3z] ■ X T 

and 

(A t , q t (z*)) =0, t = 0, 1, T 

and 

(iv) x t maximizes H(z, x, p, p^, t) = p^f^(z, x) + <p, f t (z, x)) i.e., 

H(z*, x*, p t+1 , p°, t) >_ H(z*, x t , P t+1 , p°, t) for all x t £ U t . 
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A. 17 Policy Reaction Functions , Stabilization Policy and Modes 

Lucas* critique (Lucas (1976]) of a common econometric practice seems to 
have had much impact on the econometric profession. For those who have known 
that (state variable) feedbacks modify dynamic characteristics his critique is 
not a bit surprising. In fact policy reaction functions or any sort of feedbacks 
between policies and the "state" of the economy will modify the locations of 
the eigenvalues which govern (linearized or variational) dynamics of the economy, 
hence change the structure of dynamics. 

For a simple illustration of this point, let 
z = Az + bu 

where b is an n-dimensional vectors, and u is a scalar exogenous variable. 

The dynamic characteristcs are determined by the eigenvalues of A which 
are the roots of the characteristic equation a (A) = |Al - a| = A n + a^A n 1 
+ ... + a n * Now suppose that u is generated by k'z + v where k'z is an 
automatic, i.e. , a reaction part of the policy variable and v is the dis- 
cretionary or exogenous part. The differential equation now changes into 
z = (A + bk*)z + bv. 

The eigenvalues of A + bk* are the roots of d^(A) = | Al - A - bk' ). Rewrite 
it as manipulated as follows (see Bass and Gura (1965]) to reveal the effects 
of k more clearly: 

cl (A) = | Al - A | 1 1 - (Al - A) ""^bk* | 

(+) 

= d(A){l + k' ( I - A) ~ d} . 

The second line follows from a well-known matrix identify that |l + ab' | 

= 1 + b'a. (See Aoki [1976; p.389], for example.) 
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Next, note that (Xl — A) can be written as 

(Xl - A)” 1 = adj (Xl - A)/d(X) 

where the matrices B^ , . . . , ^ in the expression 

adj (Xl - A) = X n " 1 i + B_ X n ” 2 + ... + B 

1 n-1 

are recursively generated by equating the like powers in X* 
d (X) I = (Xl - A) adj (Xl - A), 



B i - B i-1 A + a i-i J 1 = 1 n-1 

B o = 1 

or ~ ^ + = A + a^A + a^I etc. 

where 

d(X) = X n + a X n ^ + ... + a . 

1 n 

Equation (+) now states that 

(1 (A) - a (X) = (k'b)X n_1 + k* (A + a I)bX n ” 2 + ... + k'B b. 
K 1 n-1 

Let a (X) be X n + B X n_1 + ... + (3 . Then 

k i n 



3. - a. = k'B.b, 
li i ' 



i = 1, . . . , n 

when B i is are substituted out, we can arrange them 



n-1 



(B x , ..., 3 ) - (a , ..., a ) = k[b, Ab, ..., A n_1 b] 



If a desired configuration of eigenvalues are specified, then B's are determined 
If the matrix lb, Ab, ..., A n "Sd] is nonsingular, then the above equation 
can uniquely be solved for k given 3,'s and a.'s, i.e., the desired eigenvalue 
pattern are realized with this k. Or, the equation can be read to state that 
for a given choice of k, the dynamics of A is modified in such a way that 
the coefficients of the characteristic polynomials satify the above equation. 



This is known as Leverrier's algorithm (Aoki [1976; p.45J). 
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Dynamic characteristics are then influenced by policy regimes. As Sargent 
states, the matrix A should not have been regarded as "structural" parameters 
since they do not remain invariant in the face of policy interventions. 
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A. 18 Dynamic Policy Multipliers 

Effects of changing instruments on endogenous variables figure importantly 
in any contemplated policy changes. Not only the contemporaneous and long-run 
effects i.e., the effects felt at the same time of instrument changes and when 
all adjustments have worked themselves out in the model, but the processes of 
adjustment must be evaluated. Dynamic multiplier do precisely that. Elasticity 
is intended to tell us how the economy will respond to a change in a given 
situation. Naturally this response will depend on the situation in general. 
Variational dynamics we later introduce generalize the notion of elasticity to 
dynamic context, because variational dynamics tell us how the economy will 
respond over time to changes in a given situation, where the situation is to be 
understood dynamically, i.e., the situation is a set of time paths of endogenous 
and instrument variables. We have called them reference time paths. Just as 
elasticity depends on the situation, time behavior of variational dynamics will 
depends on the time paths of the situation. Dynamic multipliers depict the time 
profile of policy effects, and can be thought as dynamic elasticity concept to 
dynamic situations. Dynamic multipliers, therefore, is a particular example of 
elasticity of a time path. 

To appreciate simplicity of state space represen taion , try calculating 
dynamic multipliers using the two alternate representaions i.e., calculate 
effects of changing inputs at time T by some finite amout on the contempora- 
neous outputs, i.e., outputs at time x as well as on all subsequent times, 
t > x, using models in state space form and in ARMA form. This comparison 
should convince the reader of messier manipulation needed by the ARMA repre- 
sentation. (Aoki [1981; Appendix A] conducts such a comparison.) Also 
intertemporal optimization is easier to conduct when dynamic models are in 
state space form, especially using dynamic programming. 
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Dynamic multiplier calculations on state space models typically proceed 
as follows: We wish to compare effects of two alternate input sequences {u*} , 

i ss i, 2 on y^ using a model 



t+1 

i 



= A + Bu fc , 

i i 
= Cx^ + Du fc 



i = 1, 2. 



Then 



1 2 1 
y t - y t = c(x t 



Z* , X 

V + d(u 2 



u t 5 ' 



where 



1 2 v t— 1 t-l-T . 1 

x t - x t = S t=1 A B(u x 






because, to be fair, we must compare two alternative policies on the system 

starting from the same state at the initial time 0. Suppose 

1 2/0 T<s, T>s 

U T “ U x ^Au T = s 



Then 



1 2 
- y. 



DAu, 



and 

= CA^ ^ S BAu t > s. 

*t 

+-_T_c 

This expression CA B is the dynamic multiplier (matrix) . It is also 
known as the Markovian parameters. We later return to this expression, 
because they are also important in constructing state space models to 
(approximately) reproduce given input and output data sequences. 
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